首页 | 本学科首页   官方微博 | 高级检索  
     

融合文本分布式表示的重复缺陷报告检测
引用本文:曾杰,贲可荣,张献,徐永士.融合文本分布式表示的重复缺陷报告检测[J].计算机工程与科学,2021,43(4):670-680.
作者姓名:曾杰  贲可荣  张献  徐永士
作者单位:(海军工程大学电子工程学院,湖北 武汉 430033)
摘    要:重复缺陷报告检测能够避免对描述同一缺陷的多份报告进行重复的任务分派和修复,可降低软件维护成本.为了进一步提高检测的准确率,提出一种融合文本分布式表示的重复缺陷报告检测方法.首先,基于大规模缺陷报告数据库训练Doc2Vec模型并抽取缺陷报告的分布式表示,将不同长度的缺陷报告编码为统一长度的稠密向量.接着,通过比较这些向量...

关 键 词:重复缺陷报告  文本分布式表示  Doc2Vec模型  机器学习算法
收稿时间:2020-01-06
修稿时间:2020-06-18

Duplicate bug report detection by combining distributed representations of documents
ZENG Jie,BEN Ke-rong,ZHANG Xian,XU Yong-shi.Duplicate bug report detection by combining distributed representations of documents[J].Computer Engineering & Science,2021,43(4):670-680.
Authors:ZENG Jie  BEN Ke-rong  ZHANG Xian  XU Yong-shi
Affiliation:(College of Electronic Engineering,Naval University of Engineering,Wuhan 430033,China)
Abstract:Duplicate bug report detection can avoid the repeated assignment and repair processes for multiple bug reports that describe the same bug, and thus greatly reduce the cost of software main- tenance. To improve the accuracy of detection, this paper proposes a duplicate bug report detection method by combining distributed representations of documents. Firstly, the Doc2Vec model is trained based on a large-scale defect report database, the distributed representations of bug reports are extracted, and the variable-sized bug reports are encoded into fixed-sized dense vectors. Secondly, the similarities between different bug reports are calculated by comparing their dense vectors, it is as a new feature and combined with traditional features commonly used in the process of duplicate bug report detection, and machine learning algorithm is used to train the binary classification model. Experimental results on public duplicate bug report datasets from Bugzilla show that, compared with the state of the art method D_TS, our method improves the F1 value by 2% on average, which indicates the effectiveness of the new feature.
Keywords:duplicate bug report  distributed representations of documents  Doc2Vec model  machine learning algorithm  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号