首页 | 本学科首页   官方微博 | 高级检索  
     

一种采用对抗学习的跨项目缺陷预测方法
引用本文:邢颖,钱晓萌,管宇,章世豪,赵梦赐,林婉婷. 一种采用对抗学习的跨项目缺陷预测方法[J]. 软件学报, 2022, 33(6): 2097-2112
作者姓名:邢颖  钱晓萌  管宇  章世豪  赵梦赐  林婉婷
作者单位:北京邮电大学 人工智能学院, 北京 100876;北京邮电大学 现代邮政学院(自动化学院), 北京 100876
基金项目:国家自然科学基金(61702044); 国家重点研发计划课题(2017YFD0401001)
摘    要:跨项目缺陷预测(cross-project defect prediction, CPDP)已经成为软件工程数据挖掘领域的一个重要研究方向,它利用其他项目的缺陷代码来建立预测模型,解决了模型构建过程中的数据不足问题.然而源项目和目标项目的代码文件之间存在着数据分布的差异,导致跨项目预测效果不佳.基于生成式对抗网络(generative adversarial network,GAN)中的对抗学习思想,在鉴别器的作用下,通过改变目标项目特征的分布,使其接近于源项目特征的分布,从而提升跨项目缺陷预测的性能.具体来说,提出的抽象连续生成式对抗网络(abstract continuous generative adversarial network, AC-GAN)方法包括数据处理和模型构建两个阶段:(1)首先将源项目和目标项目的代码转换为抽象语法树(abstract syntax tree,AST)的形式,然后以深度优先方式遍历抽象语法树得出节点序列,再使用连续词袋模型(continuous bag-of-words model,CBOW)生成词向量,依据词向量表将节点序列转化为数值向量;(...

关 键 词:跨项目缺陷预测  生成式对抗网络  连续词袋模型  抽象语法树
收稿时间:2021-09-05
修稿时间:2021-10-15

Cross-project Defect Prediction Method Using Adversarial Learning
XING Ying,QIAN Xiao-Meng,GUAN Yu,ZHANG Shi-Hao,ZHAO Meng-Ci,LIN Wan-Ting. Cross-project Defect Prediction Method Using Adversarial Learning[J]. Journal of Software, 2022, 33(6): 2097-2112
Authors:XING Ying  QIAN Xiao-Meng  GUAN Yu  ZHANG Shi-Hao  ZHAO Meng-Ci  LIN Wan-Ting
Affiliation:School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China;School of Modern Post(School of Automation), Beijing University of Posts and Telecommunications, Beijing 100876, China
Abstract:Cross-project defect prediction(CPDP) has become an important research direction in data mining of software engineering, which uses the defective codes of other projects to build prediction models and solves the problem of insufficient data in the process of model construction. However, there is difference in data distribution between the code files of source and target projects, which leads to poor cross-project prediction results. Based on the adversarial learning idea of generative adversarial network (GAN), under the action of discriminator, this paper changes the distribution of target project features to make it similar to the distribution of source project features, so as to predict cross-project defects Specifically, the process of our proposed Abstract Continuous Generative Adversarial Network (AC-GAN) method consists of two stages: data processing and model construction.(1) First, the source and target project codes are converted into the form of abstract syntax trees (AST), and then the abstract syntax trees are traversed in a depth-first manner to derive the token sequences. The continuous bag-of-words model (CBOW) is used to generate word vectors, and the token sequences are transformed into numeric vectors based on the word vector table. (2) The processed numeric vectors are fed into a GAN network structure-based model for feature extraction and data migration. Finally, a binary classifier is used to determine whether the target project code files are defective or not. We conducted comparison experiments on 15 sets of source-target project pairs, and the experimental results demonstrate the effectiveness of the AC-GAN method.
Keywords:Cross-project defect prediction  Generative adversarial network  Bag-of-words model  Abstract syntax tree
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号