首页 | 本学科首页   官方微博 | 高级检索  
     

基于多源域适应和数据增强的跨项目开源软件缺陷预测
引用本文:李光杰,唐艺,何焱,张启磊,邢颖,赵梦赐. 基于多源域适应和数据增强的跨项目开源软件缺陷预测[J]. 智能安全, 2024, 3(1): 62-73
作者姓名:李光杰  唐艺  何焱  张启磊  邢颖  赵梦赐
作者单位:国防科技创新研究院协同创新项目部,国防科技创新研究院协同创新项目部,国防科技创新研究院协同创新项目部,国防科技创新研究院协同创新项目部,北京邮电大学,北京邮电大学
摘    要:通过挖掘软件代码仓库数据预测软件缺陷是提高软件质量和增强软件安全性的重要方法。人们提出了多种基于机器学习的方法挖掘软件代码仓缺陷数据预测软件缺陷。然而,由于从不同代码仓提取的软件缺陷数据具有异质性,因此机器学习的预测效果往往并不理想。为此,本文提出一种基于多源域适应和数据增强的缺陷预测方法。该方法通过挖掘各种源代码仓和目标代码仓之间的特征相似性提高预测的准确性:一方面利用带权重的最大平均方差使特征分布距离最小,另一方面利用注意力机制提高与目标代码仓高度相似的源代码仓权重。对比实验结果表明,本文所提方法在软件缺陷预测效果最佳。

关 键 词:缺陷预测;多源域适应;注意力机制;数据增强
收稿时间:2023-12-13
修稿时间:2024-01-18

Cross-project Open Source Software Defect Prediction Based on Multi-Source Domain Adaptation and Data Augmentation
liguangjie,Tang Yi,HE YAN,Zhang Qilei,Xing Ying and Zhao Mengci. Cross-project Open Source Software Defect Prediction Based on Multi-Source Domain Adaptation and Data Augmentation[J]. ARTIFICIAL INTELLIGENCE SECURITY, 2024, 3(1): 62-73
Authors:liguangjie  Tang Yi  HE YAN  Zhang Qilei  Xing Ying  Zhao Mengci
Affiliation:National Innovation Institute of Defense Technology,National Innovation Institute of Defense Technology,Beijing University of Posts and Telecommunications,Beijing University of Posts and Telecommunications
Abstract:Predicting defect through mining softwarerepositories(MSRs) is crucial for enhancing the security and quality of software. With an extensive collection of software defect data acquired by mining various repositories, numerous machine learning-based approaches have been proposed for defect detection. However, due to the heterogeneity of vulnerability data originating from different repositories, the robustness of the approach is significantly compromised. In light of this, a defect prediction approach was proposed, based on multi-source Domain Adaptation and Data Augmentation(DPDA).Our approach mined feature similarities between various source repositories and target repository. Specifically, it employed weighted maximum mean differences to minimize the distribution distance of their features. Meanwhile, different attention scores were assigned to weigh different sources to increase the weight of source repositories with high similarity to the target repository. This strategic weighting aims to focus on the source repository with high similarity in the model, reducing the impact of irrelevant repositories. The comparative experiments demonstrated that our approach can achieve the best performance in predicting software defect.
Keywords:defect prediction; multi-source domain adaptation; attention mechanism; data augmentation
点击此处可从《智能安全》浏览原始摘要信息
点击此处可从《智能安全》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号