首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 23 毫秒
1.
Software product quality can be enhanced significantly if we have a good knowledge and understanding of the potential faults therein. This paper describes a study to build predictive models to identify parts of the software that have high probability of occurrence of fault. We have considered the effect of thresholds of object‐oriented metrics on fault proneness and built predictive models based on the threshold values of the metrics used. Prediction of fault prone classes in earlier phases of software development life cycle will help software developers in allocating the resources efficiently. In this paper, we have used a statistical model derived from logistic regression to calculate the threshold values of object oriented, Chidamber and Kemerer metrics. Thresholds help developers to alarm the classes that fall outside a specified risk level. In this way, using the threshold values, we can divide the classes into two levels of risk – low risk and high risk. We have shown threshold effects at various risk levels and validated the use of these thresholds on a public domain, proprietary dataset, KC1 obtained from NASA and two open source, Promise datasets, IVY and JEdit using various machine learning methods and data mining classifiers. Interproject validation has also been carried out on three different open source datasets, Ant and Tomcat and Sakura. This will provide practitioners and researchers with well formed theories and generalised results. The results concluded that the proposed threshold methodology works well for the projects of similar nature or having similar characteristics.  相似文献   

2.
为了提高软件的可靠性,软件缺陷预测已经成为软件工程领域中一个重要的研究方向.传统的软件缺陷预测方法主要是设计静态代码度量,并用机器学习分类器来预测代码的缺陷概率.但是,静态代码度量未能充分考虑到潜藏在代码中的语义特征.根据这种状况,本文提出了一种基于深度卷积神经网络的软件缺陷预测模型.首先,从源代码的抽象语法树中选择合适的结点提取表征向量,并构建字典将其映射为整数向量以方便输入到卷积神经网络.然后,基于GoogLeNet设计卷积神经网络,利用卷积神经网络的深度挖掘数据的能力,充分挖掘出特征中的语法语义特征.另外,模型使用了随机过采样的方法来处理数据分类不均衡问题,并在网络中使用丢弃法来防止模型过拟合.最后,用Promise上的历史工程数据来测试模型,并以AUC和F1-measure为指标与其他3种方法进行了比较,实验结果显示本文提出的模型在软件缺陷预测性能上得到了一定的提升.  相似文献   

3.
A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code. However, collecting and analyzing such a large quantity of source code presents a number of challenges. Although the current generation of open source code search engines provides access to the source code in an aggregated repository, they generally fail to take advantage of the rich structural information contained in the code they index. This makes them significantly less useful than Sourcerer for building state-of-the-art software engineering tools, as these tools often require access to both the structural and textual information available in source code.We have developed Sourcerer, an infrastructure for large-scale collection and analysis of open source code. By taking full advantage of the structural information extracted from source code in its repository, Sourcerer provides a foundation upon which state-of-the-art search engines and related tools can easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer.  相似文献   

4.
软件缺陷预测可帮助开发人员提前预测缺陷程序,合理分配有限的测试资源。软件缺陷预测的准确度不仅依赖于预测方法的选择,更依赖于软件的度量指标。因此,结合多元度量指标进行软件缺陷预测已成为当前的研究热点。从度量指标出发,对传统度量指标、多元度量指标以及结合多元度量指标的缺陷预测的研究进展进行了系统介绍。主要工作包含:介绍了传统的代码和过程度量指标、基于传统度量指标的软件缺陷预测模型以及影响数据质量的因素;阐述了语义结构度量指标;分析列举了当前用于软件缺陷预测的评价指标;结合预测粒度、传统度量指标、语义结构度量指标、跨项目软件缺陷预测对多元度量指标软件缺陷预测未来的研究趋势进行了展望。  相似文献   

5.
When coding to an application programming interface (API), developers often encounter difficulties, unsure of which class to subclass, which objects to instantiate, and which methods to call. Example source code that demonstrates the use of the API can help developers make progress on their task. This paper describes an approach to provide such examples in which the structure of the source code that the developer is writing is matched heuristically to a repository of source code that uses the API. The structural context needed to query the repository is extracted automatically from the code, freeing the developer from learning a query language or from writing their code in a particular style. The repository is generated automatically from existing applications, avoiding the need for handcrafted examples. We demonstrate that the approach is effective, efficient, and more reliable than traditional alternatives through four empirical studies  相似文献   

6.
Correcting software defects accounts for a significant amount of resources in a software project. To make best use of testing efforts, researchers have studied statistical models to predict in which parts of a software system future defects are likely to occur. By studying the mathematical relations between predictor variables used in these models, researchers can form an increased understanding of the important connections between development activities and software quality. Predictor variables used in past top-performing models are largely based on source code-oriented metrics, such as lines of code or number of changes. However, source code is the end product of numerous interlaced and collaborative activities carried out by developers. Traces of such activities can be found in the various repositories used to manage development efforts. In this paper, we develop statistical models to study the impact of social interactions in a software project on software quality. These models use predictor variables based on social information mined from the issue tracking and version control repositories of two large open-source software projects. The results of our case studies demonstrate the impact of metrics from four different dimensions of social interaction on post-release defects. Our findings show that statistical models based on social information have a similar degree of explanatory power as traditional models. Furthermore, our results demonstrate that social information does not substitute, but rather augments traditional source code-based metrics used in defect prediction models.  相似文献   

7.
ContextCode ownership metrics were recently defined in order to distinguish major and minor contributors of a software module, and to assess whether the ownership of such a module is strong or shared between developers.ObjectiveThe relationship between these metrics and software quality was initially validated on proprietary software projects. Our objective in this paper is to evaluate such relationship in open-source software projects, and to compare these metrics to other code and process metrics.MethodOn a newly crafted dataset of seven open-source software projects, we perform, using inferential statistics, an analysis of code ownership metrics and their relationship with software quality.ResultsWe confirm the existence of a relationship between code ownership and software quality, but the relative importance of ownership metrics in multiple linear regression models is low compared to metrics such as the number of lines of code, the number of modifications performed over the last release, or the number of developers of a module.ConclusionAlthough we do find a relationship between code ownership and software quality, the added value of ownership metrics compared to other metrics is still to be proven.  相似文献   

8.
We describe a method to use the source code change history of a software project to drive and help to refine the search for bugs. Based on the data retrieved from the source code repository, we implement a static source code checker that searches for a commonly fixed bug and uses information automatically mined from the source code repository to refine its results. By applying our tool, we have identified a total of 178 warnings that are likely bugs in the Apache Web server source code and a total of 546 warnings that are likely bugs in Wine, an open-source implementation of the Windows API. We show that our technique is more effective than the same static analysis that does not use historical data from the source code repository.  相似文献   

9.
In this study, defect tracking is used as a proxy method to predict software readiness. The number of remaining defects in an application under development is one of the most important factors that allow one to decide if a piece of software is ready to be released. By comparing predicted number of faults and number of faults discovered in testing, software manager can decide whether the software is likely ready to be released or not.The predictive model developed in this research can predict: (i) the number of faults (defects) likely to exist, (ii) the estimated number of code changes required to correct a fault and (iii) the estimated amount of time (in minutes) needed to make the changes in respective classes of the application. The model uses product metrics as independent variables to do predictions. These metrics are selected depending on the nature of source code with regards to architecture layers, types of faults and contribution factors of these metrics. The use of neural network model with genetic training strategy is introduced to improve prediction results for estimating software readiness in this study. This genetic-net combines a genetic algorithm with a statistical estimator to produce a model which also shows the usefulness of inputs.The model is divided into three parts: (1) prediction model for presentation logic tier (2) prediction model for business tier and (3) prediction model for data access tier. Existing object-oriented metrics and complexity software metrics are used in the business tier prediction model. New sets of metrics have been proposed for the presentation logic tier and data access tier. These metrics are validated using data extracted from real world applications. The trained models can be used as tools to assist software mangers in making software release decisions.  相似文献   

10.
11.
Software defect prediction aims to predict the defect proneness of new software modules with the historical defect data so as to improve the quality of a software system. Software historical defect data has a complicated structure and a marked characteristic of class-imbalance; how to fully analyze and utilize the existing historical defect data and build more precise and effective classifiers has attracted considerable researchers’ interest from both academia and industry. Multiple kernel learning and ensemble learning are effective techniques in the field of machine learning. Multiple kernel learning can map the historical defect data to a higher-dimensional feature space and make them express better, and ensemble learning can use a series of weak classifiers to reduce the bias generated by the majority class and obtain better predictive performance. In this paper, we propose to use the multiple kernel learning to predict software defect. By using the characteristics of the metrics mined from the open source software, we get a multiple kernel classifier through ensemble learning method, which has the advantages of both multiple kernel learning and ensemble learning. We thus propose a multiple kernel ensemble learning (MKEL) approach for software defect classification and prediction. Considering the cost of risk in software defect prediction, we design a new sample weight vector updating strategy to reduce the cost of risk caused by misclassifying defective modules as non-defective ones. We employ the widely used NASA MDP datasets as test data to evaluate the performance of all compared methods; experimental results show that MKEL outperforms several representative state-of-the-art defect prediction methods.  相似文献   

12.
随着软件的演化,软件规模和复杂性的上升往往会造成代码设计质量的退化,进而导致代码可维护性下降。已有大量软件度量指标用于量化代码设计质量,但是由于数量众多,不同指标体系度量结果可比性较差,使得开发人员难以找到存在设计质量问题的模块。系统调研并整理归类了现有的规模、耦合、内聚、封装、继承和多态等6方面软件度量指标。结合度量指标关系以及实验分析,从每个方面的指标集中发现了与代码设计质量关联度最高的6个指标,从而提出一种面向代码设计质量监控的软件度量指标集。实验表明,该指标集可以有效挑选出存在代码设计质量问题的类,可作为开发人员监控和定位代码设计问题的重点关注指标。  相似文献   

13.
In defect prediction studies, open-source and real-world defect data sets are frequently used. The quality of these data sets is one of the main factors affecting the validity of defect prediction methods. One of the issues is repeated data points in defect prediction data sets. The main goal of the paper is to explore how low-level metrics are derived. This paper also presents a cleansing algorithm that removes repeated data points from defect data sets. The method was applied on 20 data sets, including five open source sets, and area under the curve (AUC) and precision performance parameters have been improved by 4.05% and 6.7%, respectively. In addition, this work discusses how static code metrics should be used in bug prediction. The study provides tips to obtain better defect prediction results.  相似文献   

14.
Object-oriented (OO) metrics are used mainly to predict software engineering activities/efforts such as maintenance effort, error proneness, and error rate. There have been discussions about the effectiveness of metrics in different contexts. In this paper, we present an empirical study of OO metrics in two iterative processes: the short-cycled agile process and the long-cycled framework evolution process. We find that OO metrics are effective in predicting design efforts and source lines of code added, changed, and deleted in the short-cycled agile process and ineffective in predicting the same aspects in the long-cycled framework process. This leads us to believe that OO metrics' predictive capability is limited to the design and implementation changes during the development iterations, not the long-term evolution of an established system in different releases.  相似文献   

15.
Object-oriented metrics aim to exhibit the quality of source code and give insight to it quantitatively. Each metric assesses the code from a different aspect. There is a relationship between the quality level and the risk level of source code. The objective of this paper is to empirically examine whether or not there are effective threshold values for source code metrics. It is targeted to derive generalized thresholds that can be used in different software systems. The relationship between metric thresholds and fault-proneness was investigated empirically in this study by using ten open-source software systems. Three types of fault-proneness were defined for the software modules: non-fault-prone, more-than-one-fault-prone, and more-than-three-fault-prone. Two independent case studies were carried out to derive two different threshold values. A single set was created by merging ten datasets and was used as training data by the model. The learner model was created using logistic regression and the Bender method. Results revealed that some metrics have threshold effects. Seven metrics gave satisfactory results in the first case study. In the second case study, eleven metrics gave satisfactory results. This study makes contributions primarily for use by software developers and testers. Software developers can see classes or modules that require revising; this, consequently, contributes to an increment in quality for these modules and a decrement in their risk level. Testers can identify modules that need more testing effort and can prioritize modules according to their risk levels.  相似文献   

16.
随着区块链技术的兴起,智能合约安全问题被越来越多的研究者和企业重视,目前已有一些针对智能合约缺陷检测技术的研究.软件缺陷预测技术是软件缺陷检测技术的有效补充,能够优化测试资源分配,提高软件测试效率.然而,目前还没有针对智能合约的软件缺陷预测研究.针对这一问题,提出了面向Solidity智能合约的缺陷预测方法.首先,设计了一组针对Solidity智能合约特有的变量、函数、结构和Solidity语言特性的度量元集(smart contract-Solidity, SC-Sol度量元集),并将其与重点考虑面向对象特征的度量元集(code complexity and features of object-oriented program, COOP度量元集)组合为COOP-SC-Sol度量元集.然后,从Solidity智能合约代码中提取相关度量元信息,并结合缺陷检测结果,构建Solidity智能合约缺陷数据集.在此基础上,应用了7种回归模型和6种分类模型进行Solidity智能合约的缺陷预测,以验证不同度量元集和不同模型在缺陷数量和倾向性预测上的性能差异.实验结果表明,相对于COOP度量元集...  相似文献   

17.

Context

Modern software engineering demands professionals and researchers to proactively and collectively work towards exploring and experimenting viable and valuable mechanisms in order to extract all kinds of degenerative bugs, security holes, and possible deviations at the initial stage. Having understood the real need here, we have introduced a novel methodology for the estimation of defect proneness of class structures in object oriented (OO) software systems at design stage.

Objective

The objective of this work is to develop an estimation model that provides significant assessment of defect proneness of object oriented software packages at design phase of SDLC. This frame work enhances the efficiency of SDLC through design quality improvement.

Method

This involves a data driven methodology which is based on the empirical study of the relationship existing between design parameters and defect proneness. In the first phase, a mapping of the relationship between the design metrics and normal occurrence pattern of defects are carried out. This is represented as a set of non linear multifunctional regression equations which reflects the influence of individual design metrics on defect proneness. The defect proneness estimation model is then generated by weighted linear combination of these multifunctional regression equations. The weighted coefficients are evaluated through GQM (Goal Question Metric) paradigm.

Results

The model evaluation and validation is carried out with a selected set of cases which is found to be promising. The current study is successfully dealt with three projects and it opens up the opportunity to extend this to a wide range of projects across industries.

Conclusion

The defect proneness estimation at design stage facilitates an effective feedback to the design architect and enabling him to identify and reduce the number of defects in the modules appropriately. This results in a considerable improvement in software design leading to cost effective products.  相似文献   

18.
Empirical validation of software metrics suites to predict fault proneness in object-oriented (OO) components is essential to ensure their practical use in industrial settings. In this paper, we empirically validate three OO metrics suites for their ability to predict software quality in terms of fault-proneness: the Chidamber and Kemerer (CK) metrics, Abreu's Metrics for Object-Oriented Design (MOOD), and Bansiya and Davis' Quality Metrics for Object-Oriented Design (QMOOD). Some CK class metrics have previously been shown to be good predictors of initial OO software quality. However, the other two suites have not been heavily validated except by their original proposers. Here, we explore the ability of these three metrics suites to predict fault-prone classes using defect data for six versions of Rhino, an open-source implementation of JavaScript written in Java. We conclude that the CK and QMOOD suites contain similar components and produce statistical models that are effective in detecting error-prone classes. We also conclude that the class components in the MOOD metrics suite are not good class fault-proneness predictors. Analyzing multivariate binary logistic regression models across six Rhino versions indicates these models may be useful in assessing quality in OO classes produced using modern highly iterative or agile software development processes.  相似文献   

19.
张献  贲可荣  曾杰 《软件学报》2021,32(7):2219-2241
软件缺陷预测是软件质量保障领域的一个活跃话题,它可以帮助开发人员发现潜在的缺陷并更好地利用资源.如何为预测系统设计更具判别力的度量元,并兼顾性能与可解释性,一直是人们致力于的研究方向.针对这一挑战,提出了一种基于代码自然性特征的缺陷预测方法——CNDePor.该方法通过正逆双向度量代码和利用质量信息对样本加权的方式改进语言模型,提高了模型所得交叉熵(CE)类度量元的缺陷判别力.针对粗粒度缺陷预测存在难以聚焦缺陷区域、代码审查成本高的不足,研究了一种新的细粒度缺陷预测问题——面向语句的切片级缺陷预测.在此问题上,设计了4种度量元,并在两类安全缺陷数据集上验证了度量元和CNDePor方法的有效性.实验结果表明:CE类度量元具有可学习性,它们蕴涵了语言模型从语料库中学习到的相关知识;改进的CE类度量元的判别力明显优于原始度量元和传统规模度量元;CNDePor方法较传统缺陷预测方法和已有的基于代码自然性的方法有显著优势,较先进的基于深度学习的方法具有可比性性能和更强的可解释性.  相似文献   

20.
Many empirical studies have found that software metrics can predict class error proneness and the prediction can be used to accurately group error-prone classes. Recent empirical studies have used open source systems. These studies, however, focused on the relationship between software metrics and class error proneness during the development phase of software projects. Whether software metrics can still predict class error proneness in a system’s post-release evolution is still a question to be answered. This study examined three releases of the Eclipse project and found that although some metrics can still predict class error proneness in three error-severity categories, the accuracy of the prediction decreased from release to release. Furthermore, we found that the prediction cannot be used to build a metrics model to identify error-prone classes with acceptable accuracy. These findings suggest that as a system evolves, the use of some commonly used metrics to identify which classes are more prone to errors becomes increasingly difficult and we should seek alternative methods (to the metric-prediction models) to locate error-prone classes if we want high accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号