期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval

Mohamed Sami Rakha Cor-Paul Bezemer Ahmed E. Hassan 《Empirical Software Engineering》2018,23(5):2597-2621

Issue tracking systems (ITSs) allow software end-users and developers to file issue reports and change requests. Reports are frequently duplicately filed for the same software issue. The retrieval of these duplicate issue reports is a tedious manual task. Prior research proposed several automated approaches for the retrieval of duplicate issue reports. Recent versions of ITSs added a feature that does basic retrieval of duplicate issue reports at the filing time of an issue report in an effort to avoid the filing of duplicates as early as possible. This paper investigates the impact of this just-in-time duplicate retrieval on the duplicate reports that end up in the ITS of an open source project. In particular, we study the differences between duplicate reports for open source projects before and after the activation of this new feature. We show how the experimental results of prior research would vary given the new data after the activation of the just-in-time duplicate retrieval feature. We study duplicate issue reports from the Mozilla-Firefox, Mozilla-Core and Eclipse-Platform projects. In addition, we compare the performance of the state of the art of the automated retrieval of duplicate reports using two popular approaches (i.e., BM25F and REP). We find that duplicate issue reports after the activation of the just-in-time duplicate retrieval feature are less textually similar, have a greater identification delay and require more discussion to be retrieved as duplicate reports than duplicates before the activation of the feature. Prior work showed that REP outperforms BM25F in terms of Recall rate and Mean average precision. We observe that the performance gap between BM25F and REP becomes even larger after the activation of the just-in-time duplicate retrieval feature. We recommend that future studies focus on duplicates that were reported after the activation of the just-in-time duplicate retrieval feature as these duplicates are more representative of future incoming issue reports and therefore, give a better representation of the future performance of proposed approaches. 相似文献

2.

Automatic test report augmentation to assist crowdsourced testing

Xin CHEN He JIANG Zhenyu CHEN Tieke HE Liming NIE 《Frontiers of Computer Science》2019,13(5):943

相似文献

3.

On the unreliability of bug severity data

Yuan Tian Nasir Ali David Lo Ahmed E. Hassan 《Empirical Software Engineering》2016,21(6):2298-2323

Severity levels, e.g., critical and minor, of bugs are often used to prioritize development efforts. Prior research efforts have proposed approaches to automatically assign the severity label to a bug report. All prior efforts verify the accuracy of their approaches using human-assigned bug reports data that is stored in software repositories. However, all prior efforts assume that such human-assigned data is reliable. Hence a perfect automated approach should be able to assign the same severity label as in the repository – achieving a 100% accuracy. Looking at duplicate bug reports (i.e., reports referring to the same problem) from three open-source software systems (OpenOffice, Mozilla, and Eclipse), we find that around 51 % of the duplicate bug reports have inconsistent human-assigned severity labels even though they refer to the same software problem. While our results do indicate that duplicate bug reports have unreliable severity labels, we believe that they send warning signals about the reliability of the full bug severity data (i.e., including non-duplicate reports). Future research efforts should explore if our findings generalize to the full dataset. Moreover, they should factor in the unreliable nature of the bug severity data. Given the unreliable nature of the severity data, classical metrics to assess the accuracy of models/learners should not be used for assessing the accuracy of approaches for automated assigning severity label. Hence, we propose a new approach to assess the performance of such models. Our new assessment approach shows that current automated approaches perform well – 77-86 % agreement with human-assigned severity labels. 相似文献

4.

基于嵌入模型的混合式相关缺陷关联方法

张洋王涛吴逸文尹刚王怀民《软件学报》2019,30(5):1407-1421

社交化编程使得开源社区中的知识可以快速被传播,其中,缺陷报告作为一类重要的软件开发知识,会含有特定的语义信息.通常,开发者会人工地将相关的缺陷报告关联起来.在一个软件项目中,发现并关联相关的缺陷报告可以为开发者提供更多的资源和信息去解决目标缺陷,从而提高缺陷修复效率.然而,现有人工关联缺陷报告的方法是十分耗费时间的,它在很大程度上取决于开发者自身的经验和知识.因此,研究如何及时、高效地关联相关缺陷是对于提高软件开发效率十分有意义的工作.将这类关联相关缺陷的问题视为推荐问题,并提出了一种基于嵌入模型的混合式相关缺陷关联方法,将传统的信息检索技术（TF-IDF）与深度学习中的嵌入模型（词嵌入模型和文档嵌入模型）结合起来.实验结果表明,该方法能够有效地提高传统方法的性能,且具有较强的应用扩展性. 相似文献

5.

High-Impact Bug Report Identification with Imbalanced Learning Strategies

下载免费PDF全文

Xin-Li Yang David Lo Xin Xia Qiao Huang Jian-Ling Sun 《计算机科学技术学报》2017,32(1):181-198

In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab. 相似文献

6.

Efficient and exact duplicate detection on cloud

Chuitian Rong Wei Lu Xiaoyong Du Xiao Zhang 《Concurrency and Computation》2013,25(15):2187-2206

As the recent proliferation of social networks, mobile applications, and online services increased the rate of data gathering, to find near‐duplicate records efficiently has become a challenging issue. Related works on this problem mainly aim to propose efficient approaches on a single machine. However, when processing large‐scale dataset, the performance to identify duplicates is still far from satisfactory. In this paper, we try to handle the problem of duplicate detection applying MapReduce. We argue that the performance of utilizing MapReduce to detect duplicates mainly depends on the number of candidate record pairs and intermediate result size, which is related to the shuffle cost among different nodes in cluster. In this paper, we proposed a new signature scheme with new pruning strategies to minimize the number of candidate pairs and intermediate result size. The proposed solution is an exact one, which assures none duplicate record pair can be lost. The experimental results over both real and synthetic datasets demonstrate that our proposed signature‐based method is efficient and scalable. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

7.

A systematic literature review to identify and classify software requirement errors

Gursimran Singh Walia Jeffrey C. Carver 《Information and Software Technology》2009,51(7):1087-1109

Most software quality research has focused on identifying faults (i.e., information is incorrectly recorded in an artifact). Because software still exhibits incorrect behavior, a different approach is needed. This paper presents a systematic literature review to develop taxonomy of errors (i.e., the sources of faults) that may occur during the requirements phase of software lifecycle. This taxonomy is designed to aid developers during the requirement inspection process and to improve overall software quality. The review identified 149 papers from the software engineering, psychology and human cognition literature that provide information about the sources of requirements faults. A major result of this paper is a categorization of the sources of faults into a formal taxonomy that provides a starting point for future research into error-based approaches to improving software quality. 相似文献

8.

Duplicate detection in adverse drug reaction surveillance

G. Niklas Norén Roland Orre Andrew Bate I. Ralph Edwards 《Data mining and knowledge discovery》2007,14(3):305-328

The WHO Collaborating Centre for International Drug Monitoring in Uppsala, Sweden, maintains and analyses the world’s largest database of reports on suspected adverse drug reaction (ADR) incidents that occur after drugs are on the market. The presence of duplicate case reports is an important data quality problem and their detection remains a formidable challenge, especially in the WHO drug safety database where reports are anonymised before submission. In this paper, we propose a duplicate detection method based on the hit-miss model for statistical record linkage described by Copas and Hilton, which handles the limited amount of training data well and is well suited for the available data (categorical and numerical rather than free text). We propose two extensions of the standard hit-miss model: a hit-miss mixture model for errors in numerical record fields and a new method to handle correlated record fields, and we demonstrate the effectiveness both at identifying the most likely duplicate for a given case report (94.7% accuracy) and at discriminating true duplicates from random matches (63% recall with 71% precision). The proposed method allows for more efficient data cleaning in post-marketing drug safety data sets, and perhaps other knowledge discovery applications as well. Responsible editor: Hannu Toivonen. 相似文献

9.

An empirical study on the efficiency of different design pattern representations in UML class diagrams

Gerardo Cepeda Porras Yann-Gaël Guéhéneuc 《Empirical Software Engineering》2010,15(5):493-522

Design patterns are recognized in the software engineering community as useful solutions to recurring design problems that improve the quality of programs. They are more and more used by developers in the design and implementation of their programs. Therefore, the visualization of the design patterns used in a program could be useful to efficiently understand how it works. Currently, a common representation to visualize design patterns is the UML collaboration notation. Previous work noticed some limitations in the UML representation and proposed new representations to tackle these limitations. However, none of these pieces of work conducted empirical studies to compare their new representations with the UML representation. We designed and conducted an empirical study to collect data on the performance of developers on basic tasks related to design pattern comprehension (i.e., identifying composition, role, participation) to evaluate the impact of three visual representations and to compare them with the UML one. We used eye-trackers to measure the developers’ effort during the execution of the study. Collected data and their analyses show that stereotype-enhanced UML diagrams are more efficient for identifying composition and role than the UML collaboration notation. The UML representation and the pattern-enhanced class diagrams are more efficient for locating the classes participating in a design pattern (i.e., identifying participation). 相似文献

10.

Creating probabilistic databases from duplicated data

Oktie Hassanzadeh Renée J. Miller 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(5):1141-1166

A major source of uncertainty in databases is the presence of duplicate items, i.e., records that refer to the same real-world entity. However, accurate deduplication is a difficult task and imperfect data cleaning may result in loss of valuable information. A reasonable alternative approach is to keep duplicates when the correct cleaning strategy is not certain, and utilize an efficient probabilistic query-answering technique to return query results along with probabilities of each answer being correct. In this paper, we present a flexible modular framework for scalably creating a probabilistic database out of a dirty relation of duplicated data and overview the challenges raised in utilizing this framework for large relations of string data. We study the problem of associating probabilities with duplicates that are detected using state-of-the-art scalable approximate join methods. We argue that standard thresholding techniques are not sufficiently robust for this task, and propose new clustering algorithms suitable for inferring duplicates and their associated probabilities. We show that the inferred probabilities accurately reflect the error in duplicate records. 相似文献

11.

Effects of stability on model composition effort: an exploratory study

Kleinner Farias Alessandro Garcia Carlos Lucena 《Software and Systems Modeling》2014,13(4):1473-1494

Model composition plays a central role in many software engineering activities, e.g., evolving design models to add new features. To support these activities, developers usually rely on model composition heuristics. The problem is that the models to-be-composed usually conflict with each other in several ways and such composition heuristics might be unable to properly deal with all emerging conflicts. Hence, the composed model may bear some syntactic and semantic inconsistencies that should be resolved. As a result, the production of the intended model is an error-prone and effort-consuming task. It is often the case that developers end up examining all parts of the output composed model instead of prioritizing the most critical ones, i.e., those that are likely to be inconsistent with the intended model. Unfortunately, little is known about indicators that help developers (1) to identify which model is more likely to exhibit inconsistencies, and (2) to understand which composed models require more effort to be invested. It is often claimed that software systems remaining stable over time tends to have a lower number of defects and require less effort to be fixed than unstable systems. However, little is known about the effects of software stability in the context of model evolution when supported by composition heuristics. This paper, therefore, presents an exploratory study analyzing stability as an indicator of inconsistency rate and resolution effort on model composition activities. Our findings are derived from 180 compositions performed to evolve design models of three software product lines. Our initial results, supported by statistical tests, also indicate which types of changes led to lower inconsistency rate and lower resolution effort. 相似文献

12.

Studying the impact of uncertainty in operational release planning – An integrated method and its initial evaluation

Ahmed Al-Emran Puneet Kapur Dietmar Pfahl Guenther Ruhe 《Information and Software Technology》2010,52(4):446-461

ContextUncertainty is an unavoidable issue in software engineering and an important area of investigation. This paper studies the impact of uncertainty on total duration (i.e., make-span) for implementing all features in operational release planning.ObjectiveThe uncertainty factors under investigation are: (1) the number of new features arriving during release construction, (2) the estimated effort needed to implement features, (3) the availability of developers, and (4) the productivity of developers.MethodAn integrated method is presented combining Monte-Carlo simulation (to model uncertainty in the operational release planning (ORP) process) with process simulation (to model the ORP process steps and their dependencies as well as an associated optimization heuristic representing an organization-specific staffing policy for make-span minimization). The method allows for evaluating the impact of uncertainty on make-span. The impact of uncertainty factors both in isolation and in combination are studied in three different pessimism levels through comparison with a baseline plan. Initial evaluation of the method is done by an explorative case study at Chartwell Technology Inc. to demonstrate its applicability and its usefulness.ResultsThe impact of uncertainty on release make-span increases – both in terms of magnitude and variance – with an increase of pessimism level as well as with an increase of the number of uncertainty factors. Among the four uncertainty factors, we found that the strongest impact stems from the number of new features arriving during release construction. We have also demonstrated that for any combination of uncertainty factors their combined (i.e., simultaneous) impact is bigger than the addition of their individual impacts.ConclusionThe added value of the presented method is that managers are able to study the impact of uncertainty on existing (i.e., baseline) operational release plans pro-actively. 相似文献

13.

The Expressive Power of Complex Values in Object-Based Data Models

Vandenbussche J. Paredaens J. 《Information and Computation》1995,120(2)

In object-based data models, complex values such as tuples or sets have no special status and must therefore be represented by objects. As a consequence, different objects may represent the same value, i.e., duplicates may occur. This paper contains a study of the precise expressive power required for the representation of complex values in typical object-based data models supporting first-order queries, object creation, and while-leaps. Such models are sufficiently powerful to express any reasonable collection of complex values, provided duplicates are allowed. It is shown that in general, the presence of such duplicates is unavoidable in the case of set values. In contrast, duplicates of tuple values can easily be eliminated. A fundamental operation for elimination of duplicate set values, called abstraction, is considered and shown to be a tractable alternative to explicit powerset construction. Other means of avoiding duplicates, such as total order, equality axioms, or copy elimination, are also discussed. 相似文献

14.

Improving bug management using correlations in crash reports

Shaohua Wang Foutse Khomh Ying Zou 《Empirical Software Engineering》2016,21(2):337-367

Nowadays, many software organizations rely on automatic problem reporting tools to collect crash reports directly from users’ environments. These crash reports are later grouped together into crash types. Usually, developers prioritize crash types based on the number of crash reports and file bug reports for the top crash types. Because a bug can trigger a crash in different usage scenarios, different crash types are sometimes related to the same bug. Two bugs are correlated when the occurrence of one bug causes the other bug to occur. We refer to a group of crash types related to identical or correlated bug reports, as a crash correlation group. In this paper, we propose five rules to identify correlated crash types automatically. We propose an algorithm to locate and rank buggy files using crash correlation groups. We also propose a method to identify duplicate and related bug reports. Through an empirical study on Firefox and Eclipse, we show that the first three rules can identify crash correlation groups using stack trace information, with a precision of 91 % and a recall of 87 % for Firefox and a precision of 76 % and a recall of 61 % for Eclipse. On the top three buggy file candidates, the proposed bug localization algorithm achieves a recall of 62 % and a precision of 42 % for Firefox, and a recall of 52 % and a precision of 50 % for Eclipse. On the top 10 buggy file candidates, the recall increases to 92 % for Firefox and 90 % for Eclipse. The proposed duplicate bug report identification method achieves a recall of 50 % and a precision of 55 % on Firefox, and a recall of 47 % and a precision of 35 % on Eclipse. Developers can combine the proposed crash correlation rules with the new bug localization algorithm to identify and fix correlated crash types all together. Triagers can use the duplicate bug report identification method to reduce their workload by filtering duplicate bug reports automatically. 相似文献

15.

Using error abstraction and classification to improve requirement quality: conclusions from a family of four empirical studies

Gursimran S. Walia Jeffrey C. Carver 《Empirical Software Engineering》2013,18(4):625-658

Achieving high software quality is a primary concern for software development organizations. Researchers have developed many quality improvement methods that help developers detect faults early in the lifecycle. To address some of the limitations of fault-based quality improvement approaches, this paper describes an approach based on errors (i.e. the sources of the faults). This research extends Lanubile et al.’s, error abstraction process by providing a formal requirement error taxonomy to help developers identify both faults and errors. The taxonomy was derived from the software engineering and psychology literature. The error abstraction and classification process and the requirement error taxonomy are validated using a family of four empirical studies. The main conclusions derived from the four studies are: (1) the error abstraction and classification process is an effective approach for identifying faults; (2) the requirement error taxonomy is useful addition to the error abstraction process; and (3) deriving requirement errors from cognitive psychology research is useful. 相似文献

16.

基于循环神经网络的缺陷报告分派方法

席圣渠姚远徐锋吕建《软件学报》2018,29(8):2322-2335

随着开源软件项目规模的不断增大,人工为缺陷报告分派合适的开发人员（缺陷分派）变得越来越困难.而不合适的缺陷分派往往会严重影响缺陷修复的效率,为此迫切需要一种缺陷分派辅助技术帮助项目管理者更好地完成缺陷分派任务.当前,大部分研究工作都基于缺陷报告文本以及相关元数据信息分析来刻画开发者的特征,忽略了对开发者活跃度的考虑,使得对具有相似特征的开发者进行缺陷报告分派预测时表现较差.本文提出了一个基于循环神经网络的深度学习模型DeepTriage,一方面利用双向循环网络加池化方法提取缺陷报告的文本特征,一方面利用单向循环网络提取特定时刻的开发者活跃度特征,并融合两者,利用已修复的缺陷报告进行监督学习.在Eclipse等四个不同的开源项目数据集上的实验结果表明,DeepTriage较同类工作在缺陷分派预测准确率上有显著提升. 相似文献

17.

Duplicate record identification in bibliographic databases

《Information Systems》1987,12(3):239-242

This study presents the applicability of an automatically generated code for use in duplicate detection in bibliographic databases. It is shown that the methods generate a large percentage of unique codes, and that the code is short enough to be useful. The code would prove to be particularly useful in identifying duplicates when records are added to the database. 相似文献

18.

Detecting Duplicate Contributions in Pull-Based Model Combining Textual and Change Similarities

下载免费PDF全文

Zhi-Xing Li Yue Yu Tao Wang Gang Yin Xin-Jun Mao Huai-Min Wang 《计算机科学技术学报》2021,36(1):191-206

Communication and coordination between open source software(OSS)developers who do not work physically in the same location have always been the challenging issues.The pull-based development model,as the state-of-the-art collaborative development mechanism,provides high openness and transparency to improve the visibility of contributors'work.However,duplicate contributions may still be submitted by more than one contributor to solve the same problem due to the parallel and uncoordinated nature of this model.If not detected in time,duplicate pull-requests can cause contributors and reviewers to waste time and energy on redundant work.In this paper,we propose an approach combining textual and change similarities to automatically detect duplicate contributions in the pull-based model at submission time.For a new-arriving contribution,we first compute textual similarity and change similarity between it and other existing contributions.And then our method returns a list of candidate duplicate contributions that are most similar to the new contribution in terms of the combined textual and change similarity.The evaluation shows that 83.4％of the duplicates can be found in average when we use the combined textual and change similarity compared with 54.8％using only textual similarity and 78.2％using only change similarity. 相似文献

19.

Prior-free rare category detection: More effective and efficient solutions

《Expert systems with applications》2014,41(17):7691-7706

Identifying statistically significant anomalies in an unlabeled data set is of key importance in many applications such as financial security and remote sensing. Rare category detection (RCD) helps address this issue by passing candidate data examples to a labeling oracle (e.g., a human expert) for labeling. A challenging task in RCD is to discover all categories without any prior information about the given data set. A few approaches have been proposed to address this issue, which are on quadratic or cubic time complexities w.r.t. the data set size N and require considerable labeling queries involving time-consuming and expensive labeling effort of a human expert. In this paper, aiming at solutions with lower time complexity and less labeling queries, we propose two prior-free (i.e., without any prior information about a given data set) RCD algorithms, namely (1) iFRED which achieves linear time complexity w.r.t. N, and (2) vFRED which substantially reduces the number of labeling queries. This is done by tabulating each dimension of the data set into bins, followed by zooming out to shrink each bin down to a position and conducting wavelet analysis on the data density function to fast locate the position (i.e., a bin) of a rare category, and zooming in the located bin to select candidate data examples for labeling. Theoretical analysis guarantees the effectiveness of our algorithms, and comprehensive experiments on both synthetic and real data sets further verify the effectiveness and efficiency. 相似文献

20.

Studying high impact fix-inducing changes

Ayse Tosun Misirli Emad Shihab Yasukata Kamei 《Empirical Software Engineering》2016,21(2):605-641

As software systems continue to play an important role in our daily lives, their quality is of paramount importance. Therefore, a plethora of prior research has focused on predicting components of software that are defect-prone. One aspect of this research focuses on predicting software changes that are fix-inducing. Although the prior research on fix-inducing changes has many advantages in terms of highly accurate results, it has one main drawback: It gives the same level of impact to all fix-inducing changes. We argue that treating all fix-inducing changes the same is not ideal, since a small typo in a change is easier to address by a developer than a thread synchronization issue. Therefore, in this paper, we study high impact fix-inducing changes (HIFCs). Since the impact of a change can be measured in different ways, we first propose a measure of impact of the fix-inducing changes, which takes into account the implementation work that needs to be done by developers in later (fixing) changes. Our measure of impact for a fix-inducing change uses the amount of churn, the number of files and the number of subsystems modified by developers during an associated fix of the fix-inducing change. We perform our study using six large open source projects to build specialized models that identify HIFCs, determine the best indicators of HIFCs and examine the benefits of prioritizing HIFCs. Using change factors, we are able to predict 56 % to 77 % of HIFCs with an average false alarm (misclassification) rate of 16 %. We find that the lines of code added, the number of developers who worked on a change, and the number of prior modifications on the files modified during a change are the best indicators of HIFCs. Lastly, we observe that a specialized model for HIFCs can provide inspection effort savings of 4 % over the state-of-the-art models. We believe our results would help practitioners prioritize their efforts towards the most impactful fix-inducing changes and save inspection effort. 相似文献