DEFECTS - Comparable and Externally Valid Software Defect Prediction


The comparability and reproducibility of empirical software engineering research is, for the most part, an open problem. This statement holds true for the field of software defect prediction. Current research shows that this leads to actual problems regarding the external validity of defect prediction research. Multiple replications conducted by different groups of researchers led to different findings than prior research. Moreover, problems with the currently used data sets were discovered and it was demonstrated that these problems may change conclusions. Thus, defect prediction research faces a replication crisis if these problems are ignored. Within this project, we plan to create a solid foundation for comparable and externally valid defect prediction research. Our approach rests on three pillars. The first pillar is the quality of the data we use for defect prediction experiments. The current studies on data quality do not cover the impact of mislabeled data. This kind of noise affects not only the creation of defect prediction models, but also their evaluation. We will statistically evaluate the noise in current data sets. Based on our findings, we will improve the state of the art of defect labeling and generate large data set with less noise. The quality of our data will be statistically validated. The collected body will be larger than the available defect prediction data sets and thereby facilitate a better generalizability and external validity of results. The second pillar is the replication of the current state of the art. Since prior replications were already contradictory to the original experiments, we believe that a broader replication effort is necessary. Current replications consider  only parts of the state of the art, e.g., classifier impact or cross-project defect prediction. Most of the state of the art still was never replicated and diligently compared to other approaches or naïve baselines. Most experiments only used small data sets, which is a key factor for the problems with external validity. We will conduct a conceptual replication of the state of the art of defect prediction. Through this, we will improve the external validity of the defect prediction state of the art and lay the groundwork for a better external validity of future work. The third pillar are guidelines for defect prediction research. In case we cannot get researchers to avoid anti-patterns that led to bad validity of results, our efforts to combat the replication crisis of defect prediction research will only have a short-term effect. To make our results sustainable, we will work together with the defect prediction community to define guidelines that allow researchers to conduct their defect prediction experiments in such a way that we hopefully never face such problems with replicability again.

Project Details

Project Staff: Steffen Herbold, Alexander Trautsch
Funding Organizations: 
Deutsche Forschungsgemeinschaft (DFG)

Related Publications

2024 © Software Engineering For Distributed Systems Group

Main menu 2