摘要:Software defect prediction is very important in helping the software development team allocate test resource efficiently and better understand the root cause of defects. Furthermore, it can help find the reason why a project is failure-prone. This paper applies binary classification in predicting if a software component has a bug by using three widely used algorithms in machine learning: Random Forest (RF), Neural Networks (NN), and Support Vector Machine (SVM). The paper investigates the applications of these algorithms to the challenging issue of predicting defects in software components. Thus, this paper combines source code metrics and process metrics as indicators for the Eclipse environment using the aforementioned three algorithms for a sample of weekly Eclipse features. In addition, this paper deals with the complex issue of data dimension and our results confirm the predictive capabilities of using data dimension reduction techniques such as Variable Importance (VI) and PCA. In our case the results of using only two features (NBD_max and Pre-defects) are comparable to the results of using 61 features. Furthermore, we evaluates the performance of the three algorithms vis-à-vis the data and both Neural Network and Random Forest turned out to have the best fit.
其他关键词:Software defect prediction, data analysis, eclipse, machine learning techniques.