首页 | 本学科首页   官方微博 | 高级检索  
     


Combined classifier for cross-project defect prediction: an extended empirical study
Authors:Yun Zhang  David Lo  Xin Xia  Jianling Sun
Affiliation:1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China2. School of Information Systems, SingaporeManagement University, Singapore 641674, Singapore
Abstract:
To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on the past history of classes, methods, or certain other code elements. These techniques are effective provided that a sufficient amount of data is available to train a prediction model. However, sufficient training data are rarely available for new software projects. To resolve this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, was proposed and is regarded as a new challenge in the area of defect prediction. Thus far, only a few cross-project defect prediction techniques have been proposed. To advance the state of the art, in this study, we investigated seven composite algorithms that integrate multiple machine learning classifiers to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we performed experiments on 10 open-source software systems from the PROMISE repository, which contain a total of 5,305 instances labeled as defective or clean. We compared the composite algorithms with the combined defect predictor where logistic regression is used as the meta classification algorithm (CODEP Logistic ), which is the most recent cross-project defect prediction algorithm in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experimental results show that several algorithms outperform CODEP Logistic : Maximum voting shows the best performance in terms of F-measure and its average F-measure is superior to that of CODEP Logistic by 36.88%. Bootstrap aggregation (BaggingJ48) shows the best performance in terms of cost effectiveness and its average cost effectiveness is superior to that of CODEP Logistic by 15.34%.
Keywords:defect prediction  cross-project  classifier combination  
本文献已被 SpringerLink 等数据库收录!
点击此处可从《Frontiers of Computer Science》浏览原始摘要信息
点击此处可从《Frontiers of Computer Science》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号