首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
在开源软件开发的维护阶段, 开源软件缺陷报告为开发人员解决缺陷提供了大量帮助。然而, 开源软件缺陷报告通常是以用户对话的形式编写, 一个软件缺陷报告可能含有数十条评论和上千个句子, 导致开发人员难以阅读或理解软件缺陷报告。为了缓解这个问题, 人们提出了开源软件缺陷报告自动摘要, 缺陷报告自动摘要可以减少开发人员阅读冗长缺陷报告的时间。本文以综述的方式对开源软件缺陷报告自动摘要的研究做了系统的归纳总结。首先, 根据摘要的表现形式, 将开源软件缺陷报告摘要分类为固定缺陷报告摘要和可视化缺陷报告摘要, 再将固定缺陷报告摘要研究方法分类为基于监督学习方法和基于无监督学习方法, 之后总结了基于监督学习和无监督学习的开源软件缺陷报告摘要生成的工作框架, 并介绍了开源软件缺陷报告摘要领域常用数据集、预处理技术和摘要评估指标。其次, 本文以无监督学习为切入点, 分类阐述和归纳了无监督开源软件缺陷报告摘要方法, 将无监督开源软件缺陷报告摘要方法分类为: 基于特征评分方法、基于深度学习方法、基于图方法和基于启发式方法, 并对每类方法进行讨论与分析。再次, 从缺陷报告摘要的实用性出发, 对现有的缺陷报告可视化摘要研究成果进行总结,并对固定缺陷报告摘要和可视化缺陷报告摘要的实用性做出分析。最后, 对现有研究成果及综述进行讨论和分析, 指出了开源软件缺陷报告摘要领域在缺陷报告数据集、抽取式摘要和黄金标准摘要三个方面面临的挑战和对未来研究的展望。  相似文献   

2.
席圣渠  姚远  徐锋  吕建 《软件学报》2018,29(8):2322-2335
随着开源软件项目规模的不断增大,人工为缺陷报告分派合适的开发人员(缺陷分派)变得越来越困难.而不合适的缺陷分派往往会严重影响缺陷修复的效率,为此迫切需要一种缺陷分派辅助技术帮助项目管理者更好地完成缺陷分派任务.当前,大部分研究工作都基于缺陷报告文本以及相关元数据信息分析来刻画开发者的特征,忽略了对开发者活跃度的考虑,使得对具有相似特征的开发者进行缺陷报告分派预测时表现较差.本文提出了一个基于循环神经网络的深度学习模型DeepTriage,一方面利用双向循环网络加池化方法提取缺陷报告的文本特征,一方面利用单向循环网络提取特定时刻的开发者活跃度特征,并融合两者,利用已修复的缺陷报告进行监督学习.在Eclipse等四个不同的开源项目数据集上的实验结果表明,DeepTriage较同类工作在缺陷分派预测准确率上有显著提升.  相似文献   

3.
Information Retrieval (IR) approaches, such as Latent Semantic Indexing (LSI) and Vector Space Model (VSM), are commonly applied to recover software traceability links. Recently, an approach based on developers’ eye gazes was proposed to retrieve traceability links. This paper presents a comparative study on IR and eye-gaze based approaches. In addition, it reports on the possibility of using eye gaze links as an alternative benchmark in comparison to commits. The study conducted asked developers to perform bug-localization tasks on the open source subject system JabRef. The iTrace environment, which is an eye tracking enabled Eclipse plugin, was used to collect eye gaze data. During the data collection phase, an eye tracker was used to gather the source code entities (SCE’s), developers looked at while solving these tasks. We present an algorithm that uses the collected gaze dataset to produce candidate traceability links related to the tasks. In the evaluation phase, we compared the results of our algorithm with the results of an IR technique, in two different contexts. In the first context, precision and recall metric values are reported for both IR and eye gaze approaches based on commits. In the second context, another set of developers were asked to rate the candidate links from each of the two techniques in terms of how useful they were in fixing the bugs. The eye gaze approach outperforms standard LSI and VSM approaches and reports a 55 % precision and 67 % recall on average for all tasks when compared to how the developers actually fixed the bug. In the second context, the usefulness results show that links generated by our algorithm were considered to be significantly more useful (to fix the bug) than those of the IR technique in a majority of tasks. We discuss the implications of this radically different method of deriving traceability links. Techniques for feature location/bug localization are commonly evaluated on benchmarks formed from commits as is done in the evaluation phase of this study. Although, commits are a reasonable source, they only capture entities that were eventually changed to fix a bug or resolve a feature. We investigate another type of benchmark based on eye tracking data, namely links generated from the bug-localization tasks given to the developers in the data collection phase. The source code entities relevant to subjected bugs recommended from IR methods are evaluated on both commits and links generated from eye gaze. The results of the benchmarking phase show that the use of eye tracking could form an effective (complementary) benchmark and add another interesting perspective in the evaluation of bug-localization techniques.  相似文献   

4.
陈理国  刘超 《软件学报》2014,25(6):1169-1179
在软件系统中,缺陷定位是缺陷修复的一个关键环节,如果能将缺陷自动定位到很小的范围,将会极大地降低缺陷修复的难度.基于高斯过程提出了一种缺陷定位方法(GPBL),即针对每个缺陷,向开发人员推荐这个缺陷可能存在于哪些源文件中,从而帮助开发人员快速修复缺陷.为了验证方法的有效性,采集了开源软件Eclipse 和Argouml 中的数据,实验结果表明,高斯过程缺陷定位的查全率和查准率平均分别为87.16%和78.90%.与基于LDA的缺陷定位方法进行比较,表明高斯过程更能准确定位缺陷的位置.  相似文献   

5.
郑炜  陈军正  吴潇雪  陈翔  夏鑫 《软件学报》2020,31(5):1294-1313
软件安全问题的发生在大多数情况下会造成非常严重的后果,及早发现安全问题,是预防安全事故的关键手段之一.安全缺陷报告预测可以辅助开发人员及早发现被测软件中潜藏的安全缺陷,从而尽早得以修复.然而,由于安全缺陷在实际项目中的数量较少,而且特征复杂(即安全缺陷类型繁多,不同类型安全缺陷特征差异性较大),这使得手工提取特征相对困难,并随后造成传统机器学习分类算法在安全缺陷报告预测性能方面存在一定的瓶颈.针对该问题,提出基于深度学习的安全缺陷报告预测方法,采用深度文本挖掘模型TextCNN和TextRNN构建安全缺陷报告预测模型;针对安全缺陷报告文本特征,使用Skip-Gram方式构建词嵌入矩阵,并借助注意力机制对TextRNN模型进行优化.所构建的模型在5个不同规模的安全缺陷报告数据集上展开了大规模实证研究,实证结果表明,深度学习模型在80%的实验案例中都优于传统机器学习分类算法,性能指标F1-score平均可提升0.258,在最好的情况下甚至可以提升0.535.此外,针对安全缺陷报告数据集存在的类不均衡问题,对不同采样方法进行了实证研究,并对结果进行了分析.  相似文献   

6.
7.
Concurrency debugging is an extremely important yet challenging problem that has been hampering developer productivity and software reliability in the multicore era. We have worked on this problem in the past eight years and have developed several effective methods and automated tools for helping developers debugging shared memory concurrent programs. This article discusses challenges in concurrency debugging and summarizes our research contributions in four important directions: concurrency bug reproduction, detection, understanding, and fixing. It also discusses other recent advances in tackling these challenges.  相似文献   

8.
Software crashes are severe manifestations of software bugs. Debugging crashing bugs is tedious and time-consuming. Understanding software changes that induce a crashing bug can provide useful contextual information for bug fixing and is highly demanded by developers. Locating the bug inducing changes is also useful for automatic program repair, since it narrows down the root causes and reduces the search space of bug fix location. However, currently there are no systematic studies on locating the software changes to a source code repository that induce a crashing bug reflected by a bucket of crash reports. To tackle this problem, we first conducted an empirical study on characterizing the bug inducing changes for crashing bugs (denoted as crash-inducing changes). We also propose ChangeLocator, a method to automatically locate crash-inducing changes for a given bucket of crash reports. We base our approach on a learning model that uses features originated from our empirical study and train the model using the data from the historical fixed crashes. We evaluated ChangeLocator with six release versions of Netbeans project. The results show that it can locate the crash-inducing changes for 44.7%, 68.5%, and 74.5% of the bugs by examining only top 1, 5 and 10 changes in the recommended list, respectively. It significantly outperforms the existing state-of-the-art approach.  相似文献   

9.
ContextBug report assignment, namely, to assign new bug reports to developers for timely and effective bug resolution, is crucial for software quality assurance. However, with the increasing size of software system, it is difficult to assign bugs to appropriate developers for bug managers.ObjectiveThis paper propose an approach, called KSAP (K-nearest-neighbor search and heterogeneous proximity), to improve automatic bug report assignment by using historical bug reports and heterogeneous network of bug repository.MethodWhen a new bug report was submitted to the bug repository, KSAP assigns developers for the bug report by using a two-phase procedure. The first phase is to search historically-resolved similar bug reports to the new bug report by K-nearest-neighbor (KNN) method. The second phase is to rank the developers who contributed to those similar bug reports by heterogeneous proximity.ResultsWe collected bug repositories of Mozilla, Eclipse, Apache Ant and Apache Tomcat6 projects to investigate the performance of the proposed KSAP approach. Experimental results demonstrate that KSAP can improve the recall of bug report assignment between 7.5–32.25% in comparison with the state of art techniques. When there is only a small number of developer collaborations on common bug reports, KSAP has shown its excellence over other sate of art techniques. When we tune the parameters of the number of historically-resolved similar bug reports (K) and the number of developers (Q) for recommendation, KSAP keeps its superiority steadily.ConclusionThis is the first paper to demonstrate how to automatically build heterogeneous network of a bug repository and extract meta-paths of developer collaborations from the heterogeneous network for bug report assignment.  相似文献   

10.
孙小兵  周澄  杨辉  李斌 《软件学报》2018,29(8):2294-2305
软件开发与维护过程中常会出现一些安全性缺陷,这些安全性缺陷会给软件和用户带来很大的风险.安全性缺陷在修复过程中,其修复级别和质量要求往往高于一般性的缺陷,因此,推荐出富有安全性经验的开发者及时有效地修复这些安全性缺陷非常重要.现有的开发者推荐技术在推荐开发者时仅仅考虑了开发者的历史开发内容,很少考虑到开发人员的安全性缺陷修复经验和修复质量等因素,所以这些技术不适用于安全性缺陷的开发者推荐.本文针对安全性缺陷的修复提出了一种有效的软件开发者推荐方法SecDR.SecDR在推荐开发者时不仅考虑了开发者的历史开发内容(与安全性相关),还分析了开发者的修复质量和历史修复缺陷的复杂度等因素.此外,SecDR还实现了开发者的多经验级别推荐:推荐初级开发者修复简单的安全性缺陷,高级开发者修复复杂的安全性缺陷.本文在三个开源项目(Mozilla,Libgdx,ElasticSearch)上分别对SecDR推荐开发者进行有效性验证.通过对比实验证明,SecDR针对安全性缺陷推荐开发者相比于其他方法(如:DR_PSF)的推荐精度平均高出19%~42%.另外,实验对比了SecDR与实际开发人员的分配情况,结果显示SecDR可以更好地规避不合理的软件开发者的推荐.  相似文献   

11.
Machine learning (ML) techniques and algorithms have been successfully and widely used in various areas including software engineering tasks. Like other software projects, bugs are also common in ML projects and libraries. In order to more deeply understand the features related to bug fixing in ML projects, we conduct an empirical study with 939 bugs from five ML projects by manually examining the bug categories, fixing patterns, fixing scale, fixing duration, and types of maintenance. The results show that (1) there are commonly seven types of bugs in ML programs; (2) twelve fixing patterns are typically used to fix the bugs in ML programs; (3) 68.80% of the patches belong to micro-scale-fix and small-scale-fix; (4) 66.77% of the bugs in ML programs can be fixed within one month; (5) 45.90% of the bug fixes belong to corrective activity from the perspective of software maintenance. Moreover, we perform a questionnaire survey and send them to developers or users of ML projects to validate the results in our empirical study. The results of our empirical study are basically consistent with the feedback from developers. The findings from the empirical study provide useful guidance and insights for developers and users to effectively detect and fix bugs in MLprojects.  相似文献   

12.
Toward an understanding of bug fix patterns   总被引:1,自引:1,他引:0  
Twenty-seven automatically extractable bug fix patterns are defined using the syntax components and context of the source code involved in bug fix changes. Bug fix patterns are extracted from the configuration management repositories of seven open source projects, all written in Java (Eclipse, Columba, JEdit, Scarab, ArgoUML, Lucene, and MegaMek). Defined bug fix patterns cover 45.7% to 63.3% of the total bug fix hunk pairs in these projects. The frequency of occurrence of each bug fix pattern is computed across all projects. The most common individual patterns are MC-DAP (method call with different actual parameter values) at 14.9–25.5%, IF-CC (change in if conditional) at 5.6–18.6%, and AS-CE (change of assignment expression) at 6.0–14.2%. A correlation analysis on the extracted pattern instances on the seven projects shows that six have very similar bug fix pattern frequencies. Analysis of if conditional bug fix sub-patterns shows a trend towards increasing conditional complexity in if conditional fixes. Analysis of five developers in the Eclipse projects shows overall consistency with project-level bug fix pattern frequencies, as well as distinct variations among developers in their rates of producing various bug patterns. Overall, data in the paper suggest that developers have difficulty with specific code situations at surprisingly consistent rates. There appear to be broad mechanisms causing the injection of bugs that are largely independent of the type of software being produced.
E. James Whitehead Jr.Email:
  相似文献   

13.
Efforts to improve application reliability can be irrelevant if the reliability of the underlying operating system on which the application resides is not seriously considered. An important first step in improving the reliability of an operating system is to gain insights into why and how the bugs originate, contributions of the different modules to the bugs, their distribution across severities, the different ways in which the bugs may be resolved and the impact of bug severities on their resolution times. To acquire this insight, we conducted an extensive analysis of the publicly available bug data on the Linux kernel over a period of seven years. We also justify and explain the statistical bug occurrence trends observed from the data, using the architecture of the Linux kernel as an anchor. The statistical analysis of the Linux bug data suggests that the Linux kernel may draw significant benefits from the continual reliability improvement efforts of its developers. These efforts, however, are disproportionately targeted towards popular configurations and hardware platforms, due to which the reliability of these configurations may be better than those that are not commonly used. Thus, a key finding of our study is that it may be prudent to restrict to using common configurations and platforms when using open source systems such as Linux in applications with stringent reliability expectations. Finally, our study of the architectural properties of the bugs suggests that the dependence among the modules rather than the unreliabilities of the individual modules is the primary cause of the bugs and their impact on system reliability.  相似文献   

14.
Bug fixing has a key role in software quality evaluation. Bug fixing starts with the bug localization step, in which developers use textual bug information to find location of source codes which have the bug. Bug localization is a tedious and time consuming process. Information retrieval requires understanding the programme's goal, coding structure, programming logic and the relevant attributes of bug. Information retrieval (IR) based bug localization is a retrieval task, where bug reports and source files represent the queries and documents, respectively. In this paper, we propose BugCatcher, a newly developed bug localization method based on multi‐level re‐ranking IR technique. We evaluate BugCatcher on three open source projects with approximately 3400 bugs. Our experiments show that multi‐level reranking approach to bug localization is promising. Retrieval performance and accuracy of BugCatcher are better than current bug localization tools, and BugCatcher has the best Top N, Mean Average Precision (MAP) and Mean Reciprocal Rank (MRR) values for all datasets.  相似文献   

15.
Developers commonly make use of a web search engine such as Google to locate online resources to improve their productivity. A better understanding of what developers search for could help us understand their behaviors and the problems that they meet during the software development process. Unfortunately, we have a limited understanding of what developers frequently search for and of the search tasks that they often find challenging. To address this gap, we collected search queries from 60 developers, surveyed 235 software engineers from more than 21 countries across five continents. In particular, we asked our survey participants to rate the frequency and difficulty of 34 search tasks which are grouped along the following seven dimensions: general search, debugging and bug fixing, programming, third party code reuse, tools, database, and testing. We find that searching for explanations for unknown terminologies, explanations for exceptions/error messages (e.g., HTTP 404), reusable code snippets, solutions to common programming bugs, and suitable third-party libraries/services are the most frequent search tasks that developers perform, while searching for solutions to performance bugs, solutions to multi-threading bugs, public datasets to test newly developed algorithms or systems, reusable code snippets, best industrial practices, database optimization solutions, solutions to security bugs, and solutions to software configuration bugs are the most difficult search tasks that developers consider. Our study sheds light as to why practitioners often perform some of these tasks and why they find some of them to be challenging. We also discuss the implications of our findings to future research in several research areas, e.g., code search engines, domain-specific search engines, and automated generation and refinement of search queries.  相似文献   

16.
当软件缺陷被提交到缺陷跟踪系统并经过确认之后,它会被分发给开发人员进行缺陷修复.这个过程就叫做缺陷分发.随着被提交到系统的缺陷报告日益增多,手工分发缺陷报告会变得越来越困难.提出了一种自动分发缺陷的方法BUTTER.与其他方法不同的是,BUTTER不仅利用主题模型分析缺陷报告中的文本信息,而且创新性地建立了一个包含提交者、缺陷和开发者三种节点及其相互关系的异构网络,从该异构网络中抽取了更多的结构信息.实验证明,BUTTER进行自动缺陷分发较其他缺陷自动分发方法要好.  相似文献   

17.
随着社交媒体的迅速发展,信息过载问题越发严重,因此如何从海量、短小而充满噪声的社交媒体数据中发现和挖掘出热点话题或者热点事件成为一个重要的问题。结合社交媒体数据实时性、地理性、包含较多元数据等特点,提出了用户行为分析与文本内容分析相结合的热点挖掘方法。在内容分析过程中,提出了从更细的词语粒度进行聚类,以代替传统的在消息粒度进行聚类的经典方法。为了提高话题关键词提取的效果,引入了基于词向量技术,并通过语义聚类的方法进行热点挖掘。在真实数据集上的实验结果表明,该方法提取的关键词语义关联性强、话题划分效果好,在主要指标上优于传统的热点挖掘方法。  相似文献   

18.
张天伦  陈荣  杨溪  祝宏玉 《软件学报》2019,30(5):1386-1406
在所有的软件系统开发过程中,Bug的存在是不可避免的问题.对于软件系统的开发者来说,修复Bug最有利的工具就是Bug报告.但是人工识别Bug报告会给开发人员带来新的负担,因此,自动对Bug报告进行分类是一项很有必要的工作.基于此,提出用基于极速学习机的方法来对Bug报告进行分类.具体而言,主要解决Bug报告自动分类的3个问题:第1个是Bug报告数据集里不同类别的样本数量不平衡问题;第2个是Bug报告数据集里被标注的样本不充足问题;第3个是Bug报告数据集总体样本量不充足问题.为了解决这3个问题,分别引入了基于代价的有监督分类方法、基于模糊度的半监督学习方法以及样本迁移方法.通过在多个Bug报告数据集上进行实验,验证了这些方法的可行性和有效性.  相似文献   

19.
The popularity of mobile devices has been steadily growing in recent years. These devices heavily depend on software from the underlying operating systems to the applications they run. Prior research showed that mobile software is different than traditional, large software systems. However, to date most of our research has been conducted on traditional software systems. Very little work has focused on the issues that mobile developers face. Therefore, in this paper, we use data from the popular online Q&A site, Stack Overflow, and analyze 13,232,821 posts to examine what mobile developers ask about. We employ Latent Dirichlet allocation-based topic models to help us summarize the mobile-related questions. Our findings show that developers are asking about app distribution, mobile APIs, data management, sensors and context, mobile tools, and user interface development. We also determine what popular mobile-related issues are the most difficult, explore platform specific issues, and investigate the types (e.g., what, how, or why) of questions mobile developers ask. Our findings help highlight the challenges facing mobile developers that require more attention from the software engineering research and development communities in the future and establish a novel approach for analyzing questions asked on Q&A forums.  相似文献   

20.
This study presents a literature analysis using a semiautomated text mining and topic modelling approach of the body of knowledge encompassed in 17 years (2000–2016) of literature published in the Wiley's Expert Systems journal, a key reference in Expert Systems (ESs) research, in a total of 488 research articles. The methodological approach included analysing countries from authors' affiliations, with results emphasizing the relevance of both U.S. and U.K. researchers, with Chinese, Turkish, and Spanish holding also a significant relevance. As a result of the sparsity found on the keywords, one of our goals became to devise a taxonomy for future submissions under 2 core dimensions: ESs' methods and ESs' applications. Finally, through topic modelling, data‐driven methods were unveiled as the most relevant, pairing with evaluation methods in its application to managerial sciences, arts, and humanities. Findings also show that most of the application domains are well represented, including health, engineering, energy, and social sciences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号