Seminar Software Evolution: Mining Software Repositories (WS2010)

Teaching Staff: Jens Grabowski, Philip Makedonski, Steffen Herbold

Course Organization

Time, Place, ECTS, etc.:
An introductory session will take place on Friday, 29.10.2010, at 10:15 in Room 2.101 at the Institute of Computer Science (Goldschmidtstr. 7), during which course topics and organization details will be discussed. At the end of the session, topics and presentation session appointments will be assigned to the attending participants. The presentation session appointments will be announced on this page. It is possible to select a topic at a later point as well.

The presentation sessions will begin in December or January (depending on the number of participants), and will take place on Fridays, weekly, starting at 10:15, in room 2.101 (preliminary, may be subject to change).

For successful participation in the course 4 ECTS Points will be awarded. Additional information is also available at UniVZ.


Language, Registration, Participation, Attendance, etc.:
The language in this course will be English. Due to the nature of the course and the available materials, it is not feasible to carry out the course in German.

The number of participants is limited, therefore early registration is recommended. Please contact Philip Makedonski (via e-Mail, or personally) for preliminary registration.

Since this is a seminar-type course, participation in the presentation sessions is mandatory. Participation in the introductory session, although recommended, is not required.

Course Description

Changes in the usage requirements and the technological landscape, among others, drive a continuous necessity for changes in software systems in order to sustain their existence and operability in changing environments.

In this seminar we will deal with evolutionary studies in information science and technology, in particular in software engineering. The focus of this semester's seminar will be on the topic of mining software repositories, which is an integral part of pretty much any kind of evolutionary studies in software engineering.

Passing Requirements

The requirements for the successful completion of the course can be summarized as follows:

  • Presentation on a selected topic (30 minutes + 10-15 minutes discussion)
  • Written report on the selected topic (12-15 pages of content in length)
  • Peer-reviews of written reports by other participants
  • Integration of the peer-review remarks into the final version of the written report
  • Active participation in all presentation sessions!

Some general guidelines for the individual requirements are listed in the respective section below. Additional specific tips and suggestions will be provided by the tutors.


In this edition of Software Evolution course, we will focus on one of the essential steps necessary for the study and analysis of software change - gathering and managing the necessary data. There are usually various sources of information related to a piece of software - source code version control systems, bug databases, time and effort tracking systems, tests, change request systems, requirements management systems, mailing lists, etc., and they all contain the pieces that constitute the puzzle of how the software has become what it is and what it may become in the future in terms of, for example, quality, features, life-cycle, etc. These pieces need to be identified and put together in order to get the complete picture. This course will deal with exactly that - identifying and relating the relevant pieces. We will familiarize ourselves with the means and methods for mining software repositories, as well as the problems and challenges associated with this task.

On a more general level, this course is intended to introduce the participants to the basics of conducting independent research and self-studies on a given topic. Participants will have to gather and process (a subset of) the available information on the topic and present it in a suitable form, respecting a number of guidelines (see below). The accumulated knowledge can then serve them both in subsequent research in this or other related fields, and / or in the participants' software engineering and development practices.

Furthermore, this semester's course seeks to familiarize the participants with peer reviewing practices that are an essential part of research. Similar practices are also well established in many business environments.

Target Audience

In general, the subject area of software evolution is relevant to everyone (looking to be) involved in the field of software engineering. This course is primarily intended for advanced Master's students interested in the subject matter and possible further research in this area (Master's theses, PhD projects), but also for dedicated students looking to extend their knowledge in software engineering and learn about useful practices they can later apply in their professional lives.

It is possible to attend the course during the Bachelor's course of studies as well, however, Bachelor's students interested in the course are encouraged to seek individual advice whether the course can be suitable for them by contacting Philip Makedonski prior to registration.

Topics (preliminary)

The topics in this course will focus on the following major areas:

  • Comparison of projects
  • Defect analysis and prediction
  • Version control and infrastructure
  • Beyond source code - text analysis
  • Search and recommendation
  • Changes and clones
  • Impact analysis
  • Practical applications and experiments
  • Available resources
  • Visualization and presentation of results</>
  • Patterns and models
  • Integration and collaboration (process-related and social aspects)

Further areas and topics may be covered during the course as well. The precise formulation of the assigned topics can be adjusted further individually.

Guidelines and Hints

Preparing a presentation:

  • 30 minutes may seem like a lot, but it is not! Choose a focal point for the talk and limit other details. The time is too short to present a topic in full detail. Focus on the main concepts and on what makes the presented work interesting.
  • 15-20 slides shall be sufficient. It usually takes about 2 minutes on average to go through a slide. It is often the case that less is more in this context
  • Do not pack slides full of text. Clear pictures and diagrams are often helpful in expressing the intended ideas. When using text, focus on the essence. Again, less is often more in this context
  • Examples are always helpful, however, do not overdo them.
  • Consider carefully whether each part is really necessary and whether it adds any value to the talk
  • Interact with the audience - see how they react to different parts of the talk, ask for affirmation on complicated parts
  • Evaluation of the topic - present your own independent position on the topic, the related materials, possibly also other opinions, your justification and reasoning behind your choice of topic, as well as related topics
  • Origins and placement of the presented work - who are the authors, how did the presented work come to be, on what foundations is it based, what other works are building upon the presented work - these are all questions whose answers shall find their place in the presentation
  • Seek feedback from your peers and tutors

The presentation slides should be provided to the tutor at latest a week before the presentation appointment. This is to ensure that the participants have enough time to include any feedback on the final version of the slides. In addition, participants are encouraged to discuss early versions of the presentations with the tutor to get some early tips.


Preparing a report:

  • 12-15 pages of content (not including title, table of contents, references, etc.)
  • Include sufficient background of the problem setting
  • Present available approaches and methods in detail. Closely related approaches may find their place too, provided there is good justification for that.
  • Evaluation and assessment of the presented approaches, as well as the topic as a whole is essential. Feedback from the presentation and the ensuing discussion may be very useful at this point. Also placement of the presented work - how it came to be, who are the authors, the foundations it builds upon as well as further work based on the presented work shall be covered as well.
  • An outlook on potential future developments on the topic and related topics
  • Final version due: end of the semester.

Participants are requested to submit a preliminary version of the final report for peer review by the end of February. The preliminary reports will be randomly assigned to the participants for peer reviews (one review per participant). The peer reviews will be due in two weeks time after the assignment, after which the original authors will have another two weeks to incorporate the reviewer's remarks into what will become the final version of the report.


Preparing a review:

A standard review form will be made available on the web page before the review round starts. In addition free-form remarks and corrections of 1-3 pages are expected from each participant. They should generally include remarks related to:

  • Language
  • Style
  • Content

Please note that the goal of the review round is to improve the reports, therefore the reviews shall be as comprehensive as possible.


The presentation sessions schedule will be announced shortly after the introductory session.


A good starting point on the topic of software evolution is the Software Evolution book by Tom Mens and Serge Demeyer (Springer, 2008, ISBN 978-3-540-76439-7). A few copies are available from the students' library at the institute. For this edition of the course, of particular interest are Chapters 3, 4, and 11, potentially also Chapter 2 and 8, as well as the introductory Chapter 1. There is also a number of references to further resources in the book.

In this edition of the course, the focus will be primarily on mining software repositories. One of the best sources for information on current research in this field is the annual IEEE Working Conference on Mining Software Repositories (MSR). In this course we will be focusing on selected topics from the recent editions of this conference.

Regarding software evolution in general, the following resources may be of interest:

Software Aging and Software Maintenance are also fields closely related to Software Evolution, which provide an even broader spectrum of research for the curious of mind.

Mining Software Repositories Challenge

The IEEE Working Conference on Mining Software Repositories hosts an annual mining challenge. The goal is to bring research and industry closer by analyzing common data sets.

The open challenge for 2011 is focused on the comparison of projects. Four projects have been selected:

  • Group 1: IDEs - Eclipse and Netbeans (written in Java)
  • Group 2: Web Browsers - Firefox and Chrome (written in C/C++)

The aim is to discover interesting similarities and/or differences in the projects. Further information is available at the MSR 2011 challenge website.

Students interested in gaining practical experience in the field of mining software repositories are encouraged to participate in the MSR challenge. For further information regarding participation and guidance, please contact Philip Makedonski (via e-Mail, or personally).

2024 © Software Engineering For Distributed Systems Group

Main menu 2