首页 | 本学科首页   官方微博 | 高级检索  
     

代价敏感的Boosting软件缺陷预测方法
引用本文:李莉,任振康,石可欣. 代价敏感的Boosting软件缺陷预测方法[J]. 计算机工程, 2022, 48(3): 175-180. DOI: 10.19678/j.issn.1000-3428.0061316
作者姓名:李莉  任振康  石可欣
作者单位:东北林业大学 信息与计算机工程学院, 哈尔滨 150040
基金项目:黑龙江省教育科学规划重点课题(GJB1421251);
摘    要:软件缺陷预测可以有效提高软件的可靠性,修复系统存在的漏洞。Boosting重抽样是解决软件缺陷预测样本数量不足问题的常用方法,但常规Boosting方法在处理领域类不平衡问题时效果不佳。为此,提出一种代价敏感的Boosting软件缺陷预测方法CSBst。针对缺陷模块漏报和误报代价不同的问题,利用代价敏感的Boosting方法更新样本权重,增大产生第一类错误的样本权重,使之大于无缺陷类样本权重与第二类错误样本的权重,从而提高模块的预测率。采用阈值移动方法对多个决策树基分类器的分类结果进行集成,以解决过拟合问题。在此基础上,通过分析给出模型构建过程中权重和阈值的最优化设置。在NASA软件缺陷预测数据集上进行实验,结果表明,在小样本的情况下,与CSBKNN、CSCE方法相比,CSBst方法的BAL预测指标分别提升7%和3%,且时间复杂度降低一个数量级。

关 键 词:软件缺陷预测  决策树  机器学习  阈值移动方法  Boosting方法  
收稿时间:2021-03-30
修稿时间:2021-06-12

Cost Sensitive Boosting Software Defect Prediction Method
LI Li,REN Zhenkang,SHI Kexin. Cost Sensitive Boosting Software Defect Prediction Method[J]. Computer Engineering, 2022, 48(3): 175-180. DOI: 10.19678/j.issn.1000-3428.0061316
Authors:LI Li  REN Zhenkang  SHI Kexin
Affiliation:College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
Abstract:Software defect prediction can effectively improve the reliability of software and remedy the loopholes in a system. Boosting resampling is a common method for solving the problem of insufficient software defect prediction samples. However,the conventional Boosting method is ineffective in solving the problem of domain class imbalance.Therefore,a cost sensitive Boosting software defect prediction method named CSBst is proposed in this study.Considering the different costs of missing data and false positives in the defect module,the cost sensitive Boosting method is used to update and increase the sample weight of the first error type. This ensures that the updated weight is greater than the weight of the flawless sample and the second error type sample,which improves the prediction rate of the module. The threshold moving method is used to integrate the classification results of multiple decision tree-based classifiers to solve the over fitting problem. Subsequently,the optimal weight and threshold values in the model construction process are determined analytically. Experiments on NASA software defect prediction dataset demonstrate that with small samples,compared to CSBKNN and CSCE methods,the BAL prediction index of CSBst method is 7%and 3% higher,respectively.Moreover,the time complexity is reduced by one order of magnitude.
Keywords:software defect prediction  decision tree  machine learning  threshold moving method  Boosting method
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号