首页 | 本学科首页   官方微博 | 高级检索  
     

基于多源域适应的缺陷类别预测方法
引用本文:邢颖,赵梦赐,杨斌,张俞炜,李文瑾,顾佳伟,袁军. 基于多源域适应的缺陷类别预测方法[J]. 软件学报, 2024, 35(7)
作者姓名:邢颖  赵梦赐  杨斌  张俞炜  李文瑾  顾佳伟  袁军
作者单位:北京邮电大学 人工智能学院, 北京 100876;高安全系统的软件开发与验证技术工业和信息化部重点实验室(南京航空航天大学), 南京 211106;中国联通研究院, 北京 100048;中国科学院 软件研究所, 北京 100190;绿盟科技集团股份有限公司, 北京 100089
基金项目:高安全系统的软件开发与验证技术工业和信息化部重点实验室资助项目(NJ2023031),CCF-绿盟科技”鲲鹏”科研计划项目(CCF-NSFOCUS202212),云南省软件工程重点实验室开放基金(2023SE202),高可信嵌入式软件工程技术实验室开放基金(LHCESET202301)
摘    要:随着规模和复杂性的迅猛膨胀,软件系统中不可避免地存在缺陷.近年来,基于深度学习的缺陷预测技术成为软件工程领域的研究热点.该类技术可以在不运行代码的情况下发现其中潜藏的缺陷,因而在工业界和学术界受到了广泛的关注.然而,已有方法大多关注方法级的源代码中是否存在缺陷,无法精确识别具体的缺陷类别,从而降低了开发人员进行缺陷定位及修复工作的效率.此外,在实际软件开发实践中,新的项目通常缺乏足够的缺陷数据来训练高精度的深度学习模型,而利用已有项目的历史数据训练好的模型往往在新项目上无法达到良好的泛化性能.因此,本文首先将传统的二分类缺陷预测任务表述为多标签分类问题,即使用CWE(common weakness enumeration)中描述的缺陷类别作为细粒度的模型预测标签.为了提高跨项目场景下的模型性能,本文提出一种融合对抗训练和注意力机制的多源域适应框架.具体而言,该框架通过对抗训练来减少域(即软件项目)差异,并进一步利用域不变特征来获得每个源域和目标域之间的特征相关性.同时,该框架还利用加权最大均值差异作为注意力机制以最小化源域和目标域特征之间的表示距离,从而使模型可以学习到更多的域无关特征.最后在八个真实世界的开源项目上与最先进的基线方法进行大量对比实验验证了所提方法的有效性.

关 键 词:缺陷类别预测  多源域适应  对抗训练  注意力机制
收稿时间:2023-09-10
修稿时间:2023-10-30

Defect Category Prediction Based on Multi-Source Domain Adaptation
XING Ying,ZHAO Meng-Ci,YANG Bin,ZHANG Yu-Wei,LI Wen-Jin,GU Jia-Wei,YUAN Jun. Defect Category Prediction Based on Multi-Source Domain Adaptation[J]. Journal of Software, 2024, 35(7)
Authors:XING Ying  ZHAO Meng-Ci  YANG Bin  ZHANG Yu-Wei  LI Wen-Jin  GU Jia-Wei  YUAN Jun
Affiliation:School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China;Key Laboratory for Safety-Critical Software Development and Verification (Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology, Nanjing 211106, China;China Unicom Research Institute, Beijing 100048, China;Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;NSFOCUS Technologies Group Co., Ltd., Beijing 100089, China
Abstract:With the rapid expansion of scale and complexity, defects inevitably exist within software systems. In recent years, defect prediction techniques based on deep learning have become a prominent research topic in the field of software engineering. These techniques can identify potential defects without executing the code, garnering significant attention from both industry and academia. However, existing approaches mostly concentrate on determining the presence of defects at the method-level code, lacking the ability to precisely classify specific defect categories. Consequently, this undermines the efficiency of developers in locating and rectifying defects. Furthermore, in practical software development, new projects often lack sufficient defect data to train high-accuracy deep learning models. Models trained on historical data from existing projects frequently struggle to achieve satisfactory generalization performance on new projects. Hence, this paper initially reformulates the traditional binary defect prediction task into a multi-label classification problem, employing defect categories described in the Common Weakness Enumeration (CWE) as fine-grained predictive labels. To enhance the model performance in cross-project scenarios, this paper proposes a multi-source domain adaptation framework that integrates adversarial training and attention mechanisms. Specifically, the proposed framework employs adversarial training to mitigate domain (i.e., software projects) discrepancies, and further utilizes domain-invariant features to capture feature correlations between each source domain and the target domain. Simultaneously, the proposed framework employs a weighted maximum mean discrepancy as an attention mechanism to minimize the representation distance between source and target domain features, facilitating model in learning more domain-independent features. Finally, this paper conducts extensive experiments on eight real-world open-source projects to verify the effectiveness of the proposed approach by comparing it with the state-of-the-art baselines.
Keywords:defect category prediction  multi-source domain adaptation  adversarial training  attention mechanism
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号