首页 | 本学科首页   官方微博 | 高级检索  
     

面向软件缺陷预测的过采样方法
引用本文:纪兴哲,邵培南. 面向软件缺陷预测的过采样方法[J]. 计算机系统应用, 2022, 31(1): 242-248. DOI: 10.15888/j.cnki.csa.008284
作者姓名:纪兴哲  邵培南
作者单位:中国电子科技集团第三十二研究所, 上海 201808
摘    要:为了缓解软件缺陷预测的类不平衡问题,避免过拟合影响缺陷预测模型的准确率,本文提出一种面向软件缺陷预测的基于异类距离排名的过采样方法(HDR).首先,对少数类实例进行3类实例区分,去除噪声实例,减少噪声数据导致的过拟合的情况,然后基于异类距离将实例进行排名,选取相似度高的实例两两组合产生新实例,以此来提升新实例的多样性,...

关 键 词:软件缺陷预测  类不平衡  过采样  SMOTE  异类距离
收稿时间:2021-04-01
修稿时间:2021-04-29

Oversampling Method for Software Defect Prediction
JI Xing-Zhe,SHAO Pei-Nan. Oversampling Method for Software Defect Prediction[J]. Computer Systems& Applications, 2022, 31(1): 242-248. DOI: 10.15888/j.cnki.csa.008284
Authors:JI Xing-Zhe  SHAO Pei-Nan
Affiliation:The 32nd Research Institute of China Electronics Technology Group Corporation, Shanghai 201808, China
Abstract:To alleviate the class imbalance problem of software defect prediction and avoid the influence of overfitting on the accuracy of the defect prediction model, this paper proposes an oversampling method for software defect prediction based on heterogeneous distance ranking (HDR). First, a minority of instances are distinguished by three classes to remove noise instances and reduce overfitting caused by noise data. Then, instances are ranked based on heterogeneous distances and paired with highly similar ones to generate new instances for the improvement of new instance diversity. Valuable minority instances that were deleted are restored afterward. The experiment compares the HDR algorithm with the SMOTE and the Borderline-SMOTE algorithms, and the RF classifier is used on the eight actual project data sets of NASA. The results show that there are 7.7% and 10.6% performance improvements on the F1-measure and G-Mean indicators respectively. Experimental results show that the HDR algorithm is significantly better than other algorithms in processing software defect prediction data sets with large data volumes and high imbalance rates.
Keywords:software defect prediction  class imbalance  oversampling  SMOTE  heterogeneous distance
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号