首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Predicting the location and number of faults in large software systems   总被引:6,自引:0,他引:6  
Advance knowledge of which files in the next release of a large software system are most likely to contain the largest numbers of faults can be a very valuable asset. To accomplish this, a negative binomial regression model has been developed and used to predict the expected number of faults in each file of the next release of a system. The predictions are based on the code of the file in the current release, and fault and modification history of the file from previous releases. The model has been applied to two large industrial systems, one with a history of 17 consecutive quarterly releases over 4 years, and the other with nine releases over 2 years. The predictions were quite accurate: for each release of the two systems, the 20 percent of the files with the highest predicted number of faults contained between 71 percent and 92 percent of the faults that were actually detected, with the overall average being 83 percent. The same model was also used to predict which files of the first system were likely to have the highest fault densities (faults per KLOC). In this case, the 20 percent of the files with the highest predicted fault densities contained an average of 62 percent of the system's detected faults. However, the identified files contained a much smaller percentage of the code mass than the files selected to maximize the numbers of faults. The model was also used to make predictions from a much smaller input set that only contained fault data from integration testing and later. The prediction was again very accurate, identifying files that contained from 71 percent to 93 percent of the faults, with the average being 84 percent. Finally, a highly simplified version of the predictor selected files containing, on average, 73 percent and 74 percent of the faults for the two systems.  相似文献   

2.
Fault prediction by negative binomial regression models is shown to be effective for four large production software systems from industry. A model developed originally with data from systems with regularly scheduled releases was successfully adapted to a system without releases to identify 20% of that system’s files that contained 75% of the faults. A model with a pre-specified set of variables derived from earlier research was applied to three additional systems, and proved capable of identifying averages of 81, 94 and 76% of the faults in those systems. A primary focus of this paper is to investigate the impact on predictive accuracy of using data about the number of developers who access individual code units. For each system, including the cumulative number of developers who had previously modified a file yielded no more than a modest improvement in predictive accuracy. We conclude that while many factors can “spoil the broth” (lead to the release of software with too many defects), the number of developers is not a major influence.
Robert M. BellEmail:
  相似文献   

3.
Just-in-time defect prediction can remind software developers and managers to verify and fix bugs at the moment they appeared, thus improving the effectiveness and validity of bug fixing. Existing studies mainly focus on just-in-time prediction for software files (JIT-F). JIT-F is a binary classification problem, which classifies (hence predicts) a file change as buggy or clean. This article provides a detailed analysis of just-in-time defect prediction for software hunks (JIT-H), which predicts bugs at a finer level of granularity, and hence further improves the efficiency of bug fixing. Classification is performed using the ensemble technique of bagging—aggregated combinations of random under sampling plus multiple classifiers (J48 and Random Forest). An empirical study with 10 open source projects was conducted to validate the effectiveness of JIT-H. Experimental results show that JIT-H is effective at predicting defects in software hunk changes. Compared with JIT-F, JIT-H is more cost effective. Additionally, analysis on the change features indicates that Text Vector features and hunk change level features are of more importance than features in other groups and levels.  相似文献   

4.
谭鑫  林泽燕  张宇霞  周明辉 《软件学报》2018,29(8):2283-2293
软件开发过程中,同一代码文件经常由多名开发者共同开发和维护,各个开发者向文件贡献了不同的代码量,使之形成特有的贡献组成.代码文件的贡献组成是否合理直接影响开发者的任务分配,进而影响软件质量和开发效率.对于不同类型的代码文件,如何刻画并确定其合理的贡献组成模式,成为一个亟待解决的问题.由于协同开发支撑工具的成熟,使得开发人员的活动可以被有效的记录,因此,其所产生的海量数据为数据驱动的智能化软件开发打下了基础.首先,基于代码所有权,从贡献组成的集中度、复杂度和稳定性三个维度出发,提出刻画贡献组成的三个量度.其次,以OpenStack的核心项目Nova为研究案例,在其版本控制数据上建立贡献组成的量度,总结了12种通用文件类型,归纳出3种贡献组成模式.最后,本文结合邮件以及面对面访谈的方式,验证了量度的有效性以及贡献组成模式的合理性,并从贡献组成的角度,对软件开发过程给出了一些指导性建议.  相似文献   

5.
We compare the effectiveness of four modeling methods—negative binomial regression, recursive partitioning, random forests and Bayesian additive regression trees—for predicting the files likely to contain the most faults for 28 to 35 releases of three large industrial software systems. Predictor variables included lines of code, file age, faults in the previous release, changes in the previous two releases, and programming language. To compare the effectiveness of the different models, we use two metrics—the percent of faults contained in the top 20% of files identified by the model, and a new, more general metric, the fault-percentile-average. The negative binomial regression and random forests models performed significantly better than recursive partitioning and Bayesian additive regression trees, as assessed by either of the metrics. For each of the three systems, the negative binomial and random forests models identified 20% of the files in each release that contained an average of 76% to 94% of the faults.  相似文献   

6.
Finding security vulnerabilities requires a different mindset than finding general faults in software—thinking like an attacker. Therefore, security engineers looking to prioritize security inspection and testing efforts may be better served by a prediction model that indicates security vulnerabilities rather than faults. At the same time, faults and vulnerabilities have commonalities that may allow development teams to use traditional fault prediction models and metrics for vulnerability prediction. The goal of our study is to determine whether fault prediction models can be used for vulnerability prediction or if specialized vulnerability prediction models should be developed when both models are built with traditional metrics of complexity, code churn, and fault history. We have performed an empirical study on a widely-used, large open source project, the Mozilla Firefox web browser, where 21% of the source code files have faults and only 3% of the files have vulnerabilities. Both the fault prediction model and the vulnerability prediction model provide similar ability in vulnerability prediction across a wide range of classification thresholds. For example, the fault prediction model provided recall of 83% and precision of 11% at classification threshold 0.6 and the vulnerability prediction model provided recall of 83% and precision of 12% at classification threshold 0.5. Our results suggest that fault prediction models based upon traditional metrics can substitute for specialized vulnerability prediction models. However, both fault prediction and vulnerability prediction models require significant improvement to reduce false positives while providing high recall.  相似文献   

7.
软件缺陷预测可帮助开发人员提前预测缺陷程序,合理分配有限的测试资源。软件缺陷预测的准确度不仅依赖于预测方法的选择,更依赖于软件的度量指标。因此,结合多元度量指标进行软件缺陷预测已成为当前的研究热点。从度量指标出发,对传统度量指标、多元度量指标以及结合多元度量指标的缺陷预测的研究进展进行了系统介绍。主要工作包含:介绍了传统的代码和过程度量指标、基于传统度量指标的软件缺陷预测模型以及影响数据质量的因素;阐述了语义结构度量指标;分析列举了当前用于软件缺陷预测的评价指标;结合预测粒度、传统度量指标、语义结构度量指标、跨项目软件缺陷预测对多元度量指标软件缺陷预测未来的研究趋势进行了展望。  相似文献   

8.
This paper presents an empirical case study that predicted faults in modules based on the total information content of the operators. This metric is closely related to Harrison's average information content classification (AICC), which is the entropy of the operators. Most information theory-based metrics proposed in the literature have not been subjected to empirical predictive studies of real-world software systems. In contrast, this study shows that a simple information theory-based metric can be more useful for prediction of software quality than comparable metrics based on counts in the context of a commercial software development organization.Three models were considered, all based on operators as an abstraction of software. The model based on information content of the operators made more accurate predictions than two similar models based on the number of operators and the number of unique operators. The purpose of this paper is a fair comparison of the three metrics, rather than developing an optimal model. We have long advocated multivariate models for industrial use. The case study considered three large commercial systems, written in assembly language, and developed consecutively by professional programmers. The first system was used to estimate parameters of the models. The subsequent two were used to evaluate the accuracy of model predictions.  相似文献   

9.
In this study, defect tracking is used as a proxy method to predict software readiness. The number of remaining defects in an application under development is one of the most important factors that allow one to decide if a piece of software is ready to be released. By comparing predicted number of faults and number of faults discovered in testing, software manager can decide whether the software is likely ready to be released or not.The predictive model developed in this research can predict: (i) the number of faults (defects) likely to exist, (ii) the estimated number of code changes required to correct a fault and (iii) the estimated amount of time (in minutes) needed to make the changes in respective classes of the application. The model uses product metrics as independent variables to do predictions. These metrics are selected depending on the nature of source code with regards to architecture layers, types of faults and contribution factors of these metrics. The use of neural network model with genetic training strategy is introduced to improve prediction results for estimating software readiness in this study. This genetic-net combines a genetic algorithm with a statistical estimator to produce a model which also shows the usefulness of inputs.The model is divided into three parts: (1) prediction model for presentation logic tier (2) prediction model for business tier and (3) prediction model for data access tier. Existing object-oriented metrics and complexity software metrics are used in the business tier prediction model. New sets of metrics have been proposed for the presentation logic tier and data access tier. These metrics are validated using data extracted from real world applications. The trained models can be used as tools to assist software mangers in making software release decisions.  相似文献   

10.
High-assurance and complex mission-critical software systems are heavily dependent on reliability of their underlying software applications. An early software fault prediction is a proven technique in achieving high software reliability. Prediction models based on software metrics can predict number of faults in software modules. Timely predictions of such models can be used to direct cost-effective quality enhancement efforts to modules that are likely to have a high number of faults. We evaluate the predictive performance of six commonly used fault prediction techniques: CART-LS (least squares), CART-LAD (least absolute deviation), S-PLUS, multiple linear regression, artificial neural networks, and case-based reasoning. The case study consists of software metrics collected over four releases of a very large telecommunications system. Performance metrics, average absolute and average relative errors, are utilized to gauge the accuracy of different prediction models. Models were built using both, original software metrics (RAW) and their principle components (PCA). Two-way ANOVA randomized-complete block design models with two blocking variables are designed with average absolute and average relative errors as response variables. System release and the model type (RAW or PCA) form the blocking variables and the prediction technique is treated as a factor. Using multiple-pairwise comparisons, the performance order of prediction models is determined. We observe that for both average absolute and average relative errors, the CART-LAD model performs the best while the S-PLUS model is ranked sixth.  相似文献   

11.
Utilization of data mining in software engineering has been the subject of several research papers. Majority of subjects of those paper were in making use of historical data for decision making activities such as cost estimation and product or project attributes prediction and estimation. The ability to predict software fault modules and the ability to correlate relations between faulty modules and product attributes using statistics is the subject of this paper. Correlations and relations between the attributes and the categorical variable or the class are studied through generating a pool of records from each dataset and then select two samples every time from the dataset and compare them. The correlation between the two selected records is studied in terms of changing from faulty to non-faulty or the opposite for the module defect attribute and the value change between the two records in each evaluated attribute (e.g. equal, larger or smaller). The goal was to study if there are certain attributes that are consistently affecting changing the state of the module from faulty to none, or the opposite. Results indicated that such technique can be very useful in studying the correlations between each attribute and the defect status attribute. Another prediction algorithm is developed based on statistics of the module and the overall dataset. The algorithm gave each attribute true class and faulty class predictions. We found that dividing prediction capability for each attribute into those two (i.e. correct and faulty module prediction) facilitate understanding the impact of attribute values on the class and hence improve the overall prediction relative to previous studies and data mining algorithms. Results were evaluated and compared with other algorithms and previous studies. ROC metrics were used to evaluate the performance of the developed metrics. Results from those metrics showed that accuracy or prediction performance calculated traditionally using accurately predicted records divided by the total number of records in the dataset does not necessarily give the best indicator of a good metric or algorithm predictability. Those predictions may give wrong implication if other metrics are not considered with them. The ROC metrics were able to show some other important aspects of performance or accuracy.  相似文献   

12.
ContextThe knowledge about particular characteristics of software that are indicators for defects is very valuable for testers because it helps them to focus the testing effort and to allocate their limited resources appropriately.ObjectiveIn this paper, we explore the relationship between several historical characteristics of files and their defect count.MethodFor this purpose, we propose an empirical approach that uses statistical procedures and visual representations of the data in order to determine indicators for a file’s defect count. We apply this approach to nine open source Java projects across different versions.ResultsOnly 4 of 9 programs show moderate correlations between a file’s defects in previous and in current releases in more than half of the analysed releases. In contrast to our expectations, the oldest files represent the most fault-prone files. Additionally, late changes correlate with a file’s defect count only partly. The number of changes, the number of distinct authors performing changes to a file as well as the file’s age are good indicators for a file’s defect count in all projects.ConclusionOur results show that a software’s history is a good indicator for ist quality. We did not find one indicator that persists across all projects in an equal manner. Nevertheless, there are several indicators that show significant strong correlations in nearly all projects: DA (number of distinct authors) and FC (frequency of change). In practice, for each software, statistical analyses have to be performed in order to evaluate the best indicator(s) for a file’s defect count.  相似文献   

13.
张献  贲可荣  曾杰 《软件学报》2021,32(7):2219-2241
软件缺陷预测是软件质量保障领域的一个活跃话题,它可以帮助开发人员发现潜在的缺陷并更好地利用资源.如何为预测系统设计更具判别力的度量元,并兼顾性能与可解释性,一直是人们致力于的研究方向.针对这一挑战,提出了一种基于代码自然性特征的缺陷预测方法——CNDePor.该方法通过正逆双向度量代码和利用质量信息对样本加权的方式改进语言模型,提高了模型所得交叉熵(CE)类度量元的缺陷判别力.针对粗粒度缺陷预测存在难以聚焦缺陷区域、代码审查成本高的不足,研究了一种新的细粒度缺陷预测问题——面向语句的切片级缺陷预测.在此问题上,设计了4种度量元,并在两类安全缺陷数据集上验证了度量元和CNDePor方法的有效性.实验结果表明:CE类度量元具有可学习性,它们蕴涵了语言模型从语料库中学习到的相关知识;改进的CE类度量元的判别力明显优于原始度量元和传统规模度量元;CNDePor方法较传统缺陷预测方法和已有的基于代码自然性的方法有显著优势,较先进的基于深度学习的方法具有可比性性能和更强的可解释性.  相似文献   

14.
Nowadays, many software organizations rely on automatic problem reporting tools to collect crash reports directly from users’ environments. These crash reports are later grouped together into crash types. Usually, developers prioritize crash types based on the number of crash reports and file bug reports for the top crash types. Because a bug can trigger a crash in different usage scenarios, different crash types are sometimes related to the same bug. Two bugs are correlated when the occurrence of one bug causes the other bug to occur. We refer to a group of crash types related to identical or correlated bug reports, as a crash correlation group. In this paper, we propose five rules to identify correlated crash types automatically. We propose an algorithm to locate and rank buggy files using crash correlation groups. We also propose a method to identify duplicate and related bug reports. Through an empirical study on Firefox and Eclipse, we show that the first three rules can identify crash correlation groups using stack trace information, with a precision of 91 % and a recall of 87 % for Firefox and a precision of 76 % and a recall of 61 % for Eclipse. On the top three buggy file candidates, the proposed bug localization algorithm achieves a recall of 62 % and a precision of 42 % for Firefox, and a recall of 52 % and a precision of 50 % for Eclipse. On the top 10 buggy file candidates, the recall increases to 92 % for Firefox and 90 % for Eclipse. The proposed duplicate bug report identification method achieves a recall of 50 % and a precision of 55 % on Firefox, and a recall of 47 % and a precision of 35 % on Eclipse. Developers can combine the proposed crash correlation rules with the new bug localization algorithm to identify and fix correlated crash types all together. Triagers can use the duplicate bug report identification method to reduce their workload by filtering duplicate bug reports automatically.  相似文献   

15.
Code reviews consist in proof-reading proposed code changes in order to find their shortcomings such as bugs, insufficient test coverage or misused design patterns. Code reviews are conducted before merging submitted changes into the main development branch. The selection of suitable reviewers is crucial to obtain the high quality of reviews. In this article we present a new method of recommending reviewers for code changes. This method is based on profiles of individual programmers. For each developer we maintain his/her profile. It is the multiset of all file path segments from commits reviewed by him/her. It will get updated when he/she presents a new review. We employ a similarity function between such profiles and change proposals to be reviewed. The programmer whose profile matches the change most is recommended to become the reviewer. We performed an experimental comparison of our method against state-of-the-art techniques using four large open-source projects. We obtained improved results in terms of classification metrics (precision, recall and F-measure) and performance (we have lower time and space complexity).  相似文献   

16.
Estimation of reliability and the number of faults present in software in its early development phase, i.e., requirement analysis or design phase is very beneficial for developing reliable software with optimal cost. Software reliability prediction in early phase of development is highly desirable to the stake holders, software developers, managers and end users. Since, the failure data are unavailable in early phase of software development, different reliability relevant software metrics and similar project data are used to develop models for early software fault prediction. The proposed model uses the linguistic values of software metrics in fuzzy inference system to predict the total number of faults present in software in its requirement analysis phase. Considering specific target reliability, weightage of each input software metrics and size of software, an algorithm has been proposed here for developing general fuzzy rule base. For model validation of the proposed model, 20 real software project data have been used here. The linguistic values from four software metrics related to requirement analysis phase have been considered as model inputs. The performance of the proposed model has been compared with two existing early software fault prediction models.  相似文献   

17.
To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on the past history of classes, methods, or certain other code elements. These techniques are effective provided that a sufficient amount of data is available to train a prediction model. However, sufficient training data are rarely available for new software projects. To resolve this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, was proposed and is regarded as a new challenge in the area of defect prediction. Thus far, only a few cross-project defect prediction techniques have been proposed. To advance the state of the art, in this study, we investigated seven composite algorithms that integrate multiple machine learning classifiers to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we performed experiments on 10 open-source software systems from the PROMISE repository, which contain a total of 5,305 instances labeled as defective or clean. We compared the composite algorithms with the combined defect predictor where logistic regression is used as the meta classification algorithm (CODEP Logistic ), which is the most recent cross-project defect prediction algorithm in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experimental results show that several algorithms outperform CODEP Logistic : Maximum voting shows the best performance in terms of F-measure and its average F-measure is superior to that of CODEP Logistic by 36.88%. Bootstrap aggregation (BaggingJ48) shows the best performance in terms of cost effectiveness and its average cost effectiveness is superior to that of CODEP Logistic by 15.34%.  相似文献   

18.
Interactive Fault Localization Using Test Information   总被引:2,自引:0,他引:2       下载免费PDF全文
Debugging is a time-consuming task in software development. Although various automated approaches have been proposed, they are not effective enough. On the other hand, in manual debugging, developers have difficulty in choosing breakpoints. To address these problems and help developers locate faults effectively, we propose an interactive fault-localization framework, combining the benefits of automated approaches and manual debugging. Before the fault is found, this framework continuously recommends checking points based on statements' suspicions, which are calculated according to the execution information of test cases and the feedback information from the developer at earlier checking points. Then we propose a naive approach, which is an initial implementation of this framework. However, with this naive approach or manual debugging, developers' wrong estimation of whether the faulty statement is executed before the checking point (breakpoint) may make the debugging process fail. So we propose another robust approach based on this framework, handling cases where developers make mistakes during the fault-localization process. We performed two experimental studies and the results show that the two interactive approaches are quite effective compared with existing fault-localization approaches. Moreover, the robust approach can help developers find faults when they make wrong estimation at some checking points.  相似文献   

19.
Software developers rely on a fast build system to incrementally compile their source code changes and produce modified deliverables for testing and deployment. Header files, which tend to trigger slow rebuild processes, are most problematic if they also change frequently during the development process, and hence, need to be rebuilt often. In this paper, we propose an approach that analyzes the build dependency graph (i.e., the data structure used to determine the minimal list of commands that must be executed when a source code file is modified), and the change history of a software system to pinpoint header file hotspots—header files that change frequently and trigger long rebuild processes. Through a case study on the GLib, PostgreSQL, Qt, and Ruby systems, we show that our approach identifies header file hotspots that, if improved, will provide greater improvement to the total future build cost of a system than just focusing on the files that trigger the slowest rebuild processes, change the most frequently, or are used the most throughout the codebase. Furthermore, regression models built using architectural and code properties of source files can explain 32–57 % of these hotspots, identifying subsystems that are particularly hotspot-prone and would benefit the most from architectural refinement.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号