一种半监督集成跨项目软件缺陷预测方法 Semi-Supervised Ensemble Learning Approach for Cross-Project Defect Prediction期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种半监督集成跨项目软件缺陷预测方法

引用本文：	何吉元,孟昭鹏,陈翔,王赞,樊向宇. 一种半监督集成跨项目软件缺陷预测方法[J]. 软件学报, 2017, 28(6): 1455-1473

作者姓名：	何吉元孟昭鹏陈翔王赞樊向宇

作者单位：	天津大学软件学院软件工程系, 天津 300072,天津大学软件学院软件工程系, 天津 300072,南通大学计算机科学与技术学院, 江苏南通 226019,天津大学软件学院软件工程系, 天津 300072,天津大学软件学院软件工程系, 天津 300072

基金项目：	家自然科学基金（61202030，61373012，61202006，71502125）

摘要：	软件缺陷预测方法可以在项目的开发初期，通过预先识别出所有可能含有缺陷的软件模块来优化测试资源的分配。早期的缺陷预测研究大多集中于同项目缺陷预测，但同项目缺陷预测需要充足的历史数据，而在实际应用中可能需要预测的项目的历史数据较为稀缺，或这个项目是一个全新项目。因此跨项目缺陷预测问题成为当前软件缺陷预测领域内的一个研究热点，其研究挑战在于源项目与目标项目数据集间存在的分布差异性以及数据集内存在的类不平衡问题。受到基于搜索的软件工程思想的启发，论文提出了一种基于搜索的半监督集成跨项目软件缺陷预测方法S³EL。该方法首先通过调整训练集中各类数据的分布比例，构建出多个朴素贝叶斯基分类器，随后利用具有全局搜索能力的遗传算法，基于少量已标记目标实例对上述基分类器进行集成，并构建出最终的缺陷预测模型。在Promise数据集及AEEEM数据集上和多个经典的跨项目缺陷预测方法（Burak过滤法、Peters过滤法、TCA+、CODEP及HYDRA）进行了对比。以F1值作为评测指标，结果表明在大部分情况下，S³EL方法可以取得最好的预测性能。
关键词：	跨项目软件缺陷预测半监督学习集成学习遗传算法朴素贝叶斯
收稿时间：	2016-07-28
修稿时间：	2016-10-10
Semi-Supervised Ensemble Learning Approach for Cross-Project Defect Prediction

HE Ji-Yuan,MENG Zhao-Peng,CHEN Xiang,WANG Zan and FAN Xiang-Yu. Semi-Supervised Ensemble Learning Approach for Cross-Project Defect Prediction[J]. Journal of Software, 2017, 28(6): 1455-1473

Authors:	HE Ji-Yuan MENG Zhao-Peng CHEN Xiang WANG Zan FAN Xiang-Yu

Affiliation:	Department of Software Engineering, School of Computer Software, Tianjin 300072, China,Department of Software Engineering, School of Computer Software, Tianjin 300072, China,School of Computer Science and Technology, Nantong University, Nantong 226019, China,Department of Software Engineering, School of Computer Software, Tianjin 300072, China and Department of Software Engineering, School of Computer Software, Tianjin 300072, China

Abstract:	Software defect prediction can help developers to optimize the distribution of test resources by predicting whether a software module is defect-prone or not. Most defect prediction researches concern on within-project defect prediction which needs enough training data from the same project. However, in real software development, a project which needs defect prediction is always a new one or without any historical data. Therefore cross-project defect prediction comes to be a hot topic which uses training data from several projects and performs prediction on another one. The main research challenges in cross-project defect prediction are the variety of distribution from source project to target project and class imbalance problem among datasets. Inspired by search based software engineering, this paper proposes a search based semi-supervised ensemble learning approach S³EL. By adjusting the ratio of distribution in training dataset, we build several Naïve Bayes classifiers as the base learners, then use a small amount of labeled target instances and genetic algorithm to combine these base classifiers as a final prediction model. We compare S³EL with up-to-date classical cross-project defect prediction approaches(such as Burak Filter, Peters Filter, TCA+, CODEP and HYDRA)on AEEEM and Promise dataset. Final results show that S³EL has the best prediction performance in most cases when considering F1 measure.

Keywords:	Cross-project Defect Prediction Semi-supervised Learning Ensemble Learning Genetic Algorithm Na?ve Bayes

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏