首页 | 本学科首页   官方微博 | 高级检索  
     

融合信息检索和深度模型特征的软件缺陷定位方法
引用本文:申宗汶,牛菲菲,李传艺,陈翔,李奇,葛季栋,骆斌. 融合信息检索和深度模型特征的软件缺陷定位方法[J]. 软件学报, 2024, 35(7)
作者姓名:申宗汶  牛菲菲  李传艺  陈翔  李奇  葛季栋  骆斌
作者单位:计算机软件新技术全国重点实验室(南京大学), 江苏 南京 210023;南通大学 信息科学技术学院, 江苏 南通 226019
基金项目:国家重点研发计划(2022YFF0711404);江苏省第六期“333工程”领军型人才团队项目和江苏省自然科学基金(No.BK20201250)
摘    要:构建自动化的缺陷定位方法能够加快程序员利用缺陷报告定位到复杂软件系统缺陷代码的过程.早期相关研究人员将缺陷定位视为检索任务,通过分析缺陷报告和相关代码构造缺陷特征,并结合信息检索的方法实现缺陷定位.随着深度学习的发展,利用深度模型特征的缺陷定位方法也取得了一定效果.然而,由于深度模型训练的时间成本和耗费资源相对较高,现有基于深度模型的缺陷定位研究方法存在实验搜索空间和真实情况不符的情况.这些研究方法在测试时并没有将项目下的所有代码作为搜索空间,而仅仅搜索了与已有缺陷相关的代码, 例如DNNLOC方法,DeepLocator方法,DreamLoc方法.这种做法和现实中程序员进行缺陷定位的搜索场景是不一致的.致力于模拟缺陷定位的真实场景,本文提出了一种融合信息检索和深度模型特征的TosLoc方法进行缺陷定位.TosLoc方法首先通过信息检索的方式检索真实项目的所有源代码,确保已有特征的充分利用;再利用深度模型挖掘源代码和缺陷报告的语义,获取最终定位结果.通过两阶段的检索,TosLoc方法能够对单个项目的所有代码实现快速缺陷定位.通过在4个常用的真实Java项目上进行实验,本文提出的TosLoc方法能在检索速度和准确性上超越已有基准方法.和最优基准方法DreamLoc相比,TosLoc方法在消耗DreamLoc方法35%的检索时间下,平均MRR值比DreamLoc方法提高了2.5%,平均MAP值提高了6.0%.

关 键 词:缺陷定位  缺陷报告  信息检索  深度学习  检索空间
收稿时间:2023-09-11
修稿时间:2023-10-30

Two-Stage Bug Location Method Combining Information Retrieval and Deep Model Features
SHEN Zong-Wen,NIU Fei-fei,LI Chuan-Yi,CHEN Xiang,Li Qi,GE Ji-Dong,LUO Bin. Two-Stage Bug Location Method Combining Information Retrieval and Deep Model Features[J]. Journal of Software, 2024, 35(7)
Authors:SHEN Zong-Wen  NIU Fei-fei  LI Chuan-Yi  CHEN Xiang  Li Qi  GE Ji-Dong  LUO Bin
Affiliation:National Key Laboratory for Novel Software Technology at Nanjing University, Nanjing 210023, China;School of Information Science and Technology, Nantong University, Nantong 226019, China
Abstract:Automated bug localization methods can accelerate the process of programmers locating complex software system defects using bug reports. Early researchers treated bug localization as a retrieval task, constructing defect features by analyzing bug reports and related code, and applying information retrieval techniques for bug localization. With the development of deep learning, bug localization methods utilizing deep model features have also achieved certain effectiveness. However, existing deep learning-based bug localization research methods suffer from experimental search space mismatching real-world scenarios due to the high time and resource costs of deep model training. These research methods do not consider all the files in the project as the search space during testing; they only search for code related to marked defects, such as the DNNLOC method, DreamLoc method, and DeepLocator method. This approach is inconsistent with the actual search scenario for programmers to localize real bug. In order to simulate the real-world scenario of bug localization, this paper proposes the TosLoc method, which combines information retrieval and deep model features for bug localization. Firstly, we employ information retrieval to retrieve all source codes of real projects to ensure comprehensive utilization of existing features. Subsequently, we utilize deep models to extract semantics from source codes and bug reports.The TosLoc method achieves rapid localization of all code in a single project through two-stage retrieval. Experimental results conducted on four popular Java projects demonstrate that the proposed TosLoc method outperforms existing benchmark methods in terms of retrieval speed and accuracy. Compared to the best method called DreamLoc, the TosLoc method achieves an average MRR improvement of 2.5% and an average MAP improvement of 6.0% while only requiring 35% of the retrieval time of the DreamLoc method.
Keywords:bug location  bug reports  informational retrieval  deep learning  search space
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号