首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Software managers are routinely confronted with software projects that contain errors or inconsistencies and exceed budget and time limits. By mining software repositories with comprehensible data mining techniques, predictive models can be induced that offer software managers the insights they need to tackle these quality and budgeting problems in an efficient way. This paper deals with the role that the Ant Colony Optimization (ACO)-based classification technique AntMiner+ can play as a comprehensible data mining technique to predict erroneous software modules. In an empirical comparison on three real-world public datasets, the rule-based models produced by AntMiner+ are shown to achieve a predictive accuracy that is competitive to that of the models induced by several other included classification techniques, such as C4.5, logistic regression and support vector machines. In addition, we will argue that the intuitiveness and comprehensibility of the AntMiner+ models can be considered superior to the latter models.  相似文献   

Just-in-time defect prediction can remind software developers and managers to verify and fix bugs at the moment they appeared, thus improving the effectiveness and validity of bug fixing. Existing studies mainly focus on just-in-time prediction for software files (JIT-F). JIT-F is a binary classification problem, which classifies (hence predicts) a file change as buggy or clean. This article provides a detailed analysis of just-in-time defect prediction for software hunks (JIT-H), which predicts bugs at a finer level of granularity, and hence further improves the efficiency of bug fixing. Classification is performed using the ensemble technique of bagging—aggregated combinations of random under sampling plus multiple classifiers (J48 and Random Forest). An empirical study with 10 open source projects was conducted to validate the effectiveness of JIT-H. Experimental results show that JIT-H is effective at predicting defects in software hunk changes. Compared with JIT-F, JIT-H is more cost effective. Additionally, analysis on the change features indicates that Text Vector features and hunk change level features are of more importance than features in other groups and levels.  相似文献   

Software reliability is one of the most important software quality indicators. It is concerned with the probability that the software can execute without any unintended behavior in a given environment. In previous research we developed the Reliability Prediction System (RePS) methodology to predict the reliability of safety critical software such as those used in the nuclear industry. A RePS methodology relates the software engineering measures to software reliability using various models, and it was found that RePS’s using Extended Finite State Machine (EFSM) models and fault data collected through various software engineering measures possess the most satisfying prediction capability. In this research the EFSM-based RePS methodology is improved and implemented into a tool called Automated Reliability Prediction System (ARPS). The features of the ARPS tool are introduced with a simple case study. An experiment using human subjects was also conducted to evaluate the usability of the tool, and the results demonstrate that the ARPS tool can indeed help the analyst apply the EFSM-based RePS methodology with less number of errors and lower error criticality.  相似文献   

The knowledge, prior to system operations, of which program modules are problematic is valuable to a software quality assurance team, especially when there is a constraint on software quality enhancement resources. A cost-effective approach for allocating such resources is to obtain a prediction in the form of a quality-based ranking of program modules. Subsequently, a module-order model (MOM) is used to gauge the performance of the predicted rankings. From a practical software engineering point of view, multiple software quality objectives may be desired by a MOM for the system under consideration: e.g., the desired rankings may be such that 100% of the faults should be detected if the top 50% of modules with highest number of faults are subjected to quality improvements. Moreover, the management team for the same system may also desire that 80% of the faults should be accounted if the top 20% of the modules are targeted for improvement. Existing work related to MOM(s) use a quantitative prediction model to obtain the predicted rankings of program modules, implying that only the fault prediction error measures such as the average, relative, or mean square errors are minimized. Such an approach does not provide a direct insight into the performance behavior of a MOM. For a given percentage of modules enhanced, the performance of a MOM is gauged by how many faults are accounted for by the predicted ranking as compared with the perfect ranking. We propose an approach for calibrating a multiobjective MOM using genetic programming. Other estimation techniques, e.g., multiple linear regression and neural networks cannot achieve multiobjective optimization for MOM(s). The proposed methodology facilitates the simultaneous optimization of multiple performance objectives for a MOM. Case studies of two industrial software systems are presented, the empirical results of which demonstrate a new promise for goal-oriented software quality modeling.  相似文献   



Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company data. It is difficult to employ these models which are built on the within-company data in practice, because of the lack of these local data repositories. Recently, transfer learning has attracted more and more attention for building classifier in target domain using the data from related source domain. It is very useful in cases when distributions of training and test instances differ, but is it appropriate for cross-company software defect prediction?


In this paper, we consider the cross-company defect prediction scenario where source and target data are drawn from different companies. In order to harness cross company data, we try to exploit the transfer learning method to build faster and highly effective prediction model.


Unlike the prior works selecting training data which are similar from the test data, we proposed a novel algorithm called Transfer Naive Bayes (TNB), by using the information of all the proper features in training data. Our solution estimates the distribution of the test data, and transfers cross-company data information into the weights of the training data. On these weighted data, the defect prediction model is built.


This article presents a theoretical analysis for the comparative methods, and shows the experiment results on the data sets from different organizations. It indicates that TNB is more accurate in terms of AUC (The area under the receiver operating characteristic curve), within less runtime than the state of the art methods.


It is concluded that when there are too few local training data to train good classifiers, the useful knowledge from different-distribution training data on feature level may help. We are optimistic that our transfer learning method can guide optimal resource allocation strategies, which may reduce software testing cost and increase effectiveness of software testing process.  相似文献   

为了提高预测模型的性能,解决不同属性子集带来的分歧,提出了基本偏相关方法的预测模型。首先,该方法在公开数据集上分析出代码静态属性与缺陷数之间存在偏相关关系;然后基于偏相关系数值,计算出代码复杂性度密度属性值;最后基于该属性值建立新的缺陷预测模型。实验表明,该模型具有较高的召回率和很好的F-measure性能,从而进一步证实了代码属性与模块缺陷之间的偏相关性是影响软件质量预测性能的重要因素的结论。该结论有助于建立更加稳定可靠的软件缺陷预测模型。  相似文献   

New methodologies and tools have gradually made the life cycle for software development more human-independent. Much of the research in this field focuses on defect reduction, defect identification and defect prediction. Defect prediction is a relatively new research area that involves using various methods from artificial intelligence to data mining. Identifying and locating defects in software projects is a difficult task. Measuring software in a continuous and disciplined manner provides many advantages such as the accurate estimation of project costs and schedules as well as improving product and process qualities. This study aims to propose a model to predict the number of defects in the new version of a software product with respect to the previous stable version. The new version may contain changes related to a new feature or a modification in the algorithm or bug fixes. Our proposed model aims to predict the new defects introduced into the new version by analyzing the types of changes in an objective and formal manner as well as considering the lines of code (LOC) change. Defect predictors are helpful tools for both project managers and developers. Accurate predictors may help reducing test times and guide developers towards implementing higher quality codes. Our proposed model can aid software engineers in determining the stability of software before it goes on production. Furthermore, such a model may provide useful insight for understanding the effects of a feature, bug fix or change in the process of defect detection.
Ayşe Basar BenerEmail:

为解决软件缺陷预测问题引入了最小二乘支持向量机算法(LS-SVM),加速了超参数的选择过程,给出了逐个加入新的样本用以模型校正的快捷方法,以软件复杂性度量为线索,建立了基于FLS-SVM的软件缺陷预测模型。通过具体实例阐明了模型的执行过程及小样本情况下比神经网络更为出色的预测能力,并根据回归方程指出了对软件缺陷影响显著的复杂性度量。  相似文献   

Innovations in Systems and Software Engineering - Unlike several other engineering disciplines, software engineering lacks well-defined research strategies. However, with the exponential rise in...  相似文献   

互联网上已形成了规模巨大、种类丰富的开源软件资源。如何准确、快速地判断一个开源项目的各种可信属性是否满足需求是当前软件工程领域研究的热点。深入分析已有开源软件评估模型,总结互联网上软件质量相关的各种信息,提出了面向开源软件的可信评估证据框架,并基于该框架构建了一种开源软件可信证据查询平台。利用该平台能够极大地提高评估效率,用户可以准确、快速、全面地了解相关软件项目的各种信息。最后,以一个知名开源软件证实了该证据框架及证据查询平台的可行性。  相似文献   

Modern service-oriented enterprise systems have increasingly complex and dynamic loosely-coupled architectures that often exhibit poor performance and resource efficiency and have high operating costs. This is due to the inability to predict at run-time the effect of workload changes on performance-relevant application-level dependencies and adapt the system configuration accordingly. Architecture-level performance models provide a powerful tool for performance prediction, however, current approaches to modeling the context of software components are not suitable for use at run-time. In this paper, we analyze typical online performance prediction scenarios and propose a performance meta-model for (i) expressing and resolving parameter and context dependencies, (ii) modeling service abstractions at different levels of granularity and (iii) modeling the deployment of software components in complex resource landscapes. The presented meta-model is a subset of the Descartes Meta-Model (DMM) for online performance prediction, specifically designed for use in online scenarios. We motivate and validate our approach in the context of realistic and representative online performance prediction scenarios based on the SPECjEnterprise2010 standard benchmark.  相似文献   

This article tackles the problem of predicting effort (in person–hours) required to fix a software defect posted on an Issue Tracking System. The proposed method is inspired by the Nearest Neighbour Approach presented by the pioneering work of Weiss et al. (2007) [1]. We propose four enhancements to Weiss et al. (2007) [1]: Data Enrichment, Majority Voting, Adaptive Threshold and Binary Clustering. Data Enrichment infuses additional issue information into the similarity-scoring procedure, aiming to increase the accuracy of similarity scores. Majority Voting exploits the fact that many of the similar historical issues have repeating effort values, which are close to the actual. Adaptive Threshold automatically adjusts the similarity threshold to ensure that we obtain only the most similar matches. We use Binary Clustering if the similarity scores are very low, which might result in misleading predictions. This uses common properties of issues to form clusters (independent of the similarity scores) which are then used to produce the predictions. Numerical results are presented showing a noticeable improvement over the method proposed in Weiss et al. (2007) [1].  相似文献   

Software defect prediction aims to predict the defect proneness of new software modules with the historical defect data so as to improve the quality of a software system. Software historical defect data has a complicated structure and a marked characteristic of class-imbalance; how to fully analyze and utilize the existing historical defect data and build more precise and effective classifiers has attracted considerable researchers’ interest from both academia and industry. Multiple kernel learning and ensemble learning are effective techniques in the field of machine learning. Multiple kernel learning can map the historical defect data to a higher-dimensional feature space and make them express better, and ensemble learning can use a series of weak classifiers to reduce the bias generated by the majority class and obtain better predictive performance. In this paper, we propose to use the multiple kernel learning to predict software defect. By using the characteristics of the metrics mined from the open source software, we get a multiple kernel classifier through ensemble learning method, which has the advantages of both multiple kernel learning and ensemble learning. We thus propose a multiple kernel ensemble learning (MKEL) approach for software defect classification and prediction. Considering the cost of risk in software defect prediction, we design a new sample weight vector updating strategy to reduce the cost of risk caused by misclassifying defective modules as non-defective ones. We employ the widely used NASA MDP datasets as test data to evaluate the performance of all compared methods; experimental results show that MKEL outperforms several representative state-of-the-art defect prediction methods.  相似文献   

ContextSoftware defect prediction (SDP) is an important task in software engineering. Along with estimating the number of defects remaining in software systems and discovering defect associations, classifying the defect-proneness of software modules plays an important role in software defect prediction. Several machine-learning methods have been applied to handle the defect-proneness of software modules as a classification problem. This type of “yes” or “no” decision is an important drawback in the decision-making process and if not precise may lead to misclassifications. To the best of our knowledge, existing approaches rely on fully automated module classification and do not provide a way to incorporate extra knowledge during the classification process. This knowledge can be helpful in avoiding misclassifications in cases where system modules cannot be classified in a reliable way.ObjectiveWe seek to develop a SDP method that (i) incorporates a reject option in the classifier to improve the reliability in the decision-making process; and (ii) makes it possible postpone the final decision related to rejected modules for an expert analysis or even for another classifier using extra domain knowledge.MethodWe develop a SDP method called rejoELM and its variant, IrejoELM. Both methods were built upon the weighted extreme learning machine (ELM) with reject option that makes it possible postpone the final decision of non-classified modules, the rejected ones, to another moment. While rejoELM aims to maximize the accuracy for a rejection rate, IrejoELM maximizes the F-measure. Hence, IrejoELM becomes an alternative for classification with reject option for imbalanced datasets.ResultsrejoEM and IrejoELM are tested on five datasets of source code metrics extracted from real world open-source software projects. Results indicate that rejoELM has an accuracy for several rejection rates that is comparable to some state-of-the-art classifiers with reject option. Although IrejoELM shows lower accuracies for several rejection rates, it clearly outperforms all other methods when the F-measure is used as a performance metric.ConclusionIt is concluded that rejoELM is a valid alternative for classification with reject option problems when classes are nearly equally represented. On the other hand, IrejoELM is shown to be the best alternative for classification with reject option on imbalanced datasets. Since SDP problems are usually characterized as imbalanced learning problems, the use of IrejoELM is recommended.  相似文献   

Prediction of fault-prone modules provides one way to support software quality engineering through improved scheduling and project control. The primary goal of our research was to develop and refine techniques for early prediction of fault-prone modules. The objective of this paper is to review and improve an approach previously examined in the literature for building prediction models, i.e. principal component analysis (PCA) and discriminant analysis (DA). We present findings of an empirical study at Ericsson Telecom AB for which the previous approach was found inadequate for predicting the most fault-prone modules using software design metrics. Instead of dividing modules into fault-prone and not-fault-prone, modules are categorized into several groups according to the ordered number of faults. It is shown that the first discriminant coordinates (DC) statistically increase with the ordering of modules, thus improving prediction and prioritization efforts. The authors also experienced problems with the smoothing parameter as used previously for DA. To correct this problem and further improve predictability, separate estimation of the smoothing parameter is shown to be required.  相似文献   

ContextThe software defect prediction during software development has recently attracted the attention of many researchers. The software defect density indicator prediction in each phase of software development life cycle (SDLC) is desirable for developing a reliable software product. Software defect prediction at the end of testing phase may not be more beneficial because the changes need to be performed in the previous phases of SDLC may require huge amount of money and effort to be spent in order to achieve target software quality. Therefore, phase-wise software defect density indicator prediction model is of great importance.ObjectiveIn this paper, a fuzzy logic based phase-wise software defect prediction model is proposed using the top most reliability relevant metrics of the each phase of the SDLC.MethodIn the proposed model, defect density indicator in requirement analysis, design, coding and testing phase is predicted using nine software metrics of these four phases. The defect density indicator metric predicted at the end of the each phase is also taken as an input to the next phase. Software metrics are assessed in linguistic terms and fuzzy inference system has been employed to develop the model.ResultsThe predictive accuracy of the proposed model is validated using twenty real software project data. Validation results are satisfactory. Measures based on the mean magnitude of relative error and balanced mean magnitude of relative error decrease significantly as the software project size increases.ConclusionIn this paper, a fuzzy logic based model is proposed for predicting software defect density indicator at each phase of the SDLC. The predicted defects of twenty different software projects are found very near to the actual defects detected during testing. The predicted defect density indicators are very helpful to analyze the defect severity in different artifacts of SDLC of a software project.  相似文献   

基于生命周期的软件缺陷预测技术   总被引:1,自引:0,他引:1  
为保证软件可靠性和软件质量,在基于软件开发周期的基础上,提出了一种利用PCA-BP模糊神经网络的软件缺陷预计方法.针对影响软件可靠性的各种因素,依据相关的标准,结合工程实践,选取了影响软件可靠性的度量元.收集了实际工程中的一类飞行控制软件的度量数据,利用提出的模型进行缺陷预测,并将预测结果与传统的BP神经网络模型计算的结果进行了对比.对比结果表明,与基于BP神经网络的预测方法相比较,结合了主成分分析方法的PCA-BP神经网络预测方法具有更快的收敛速度和更高的预测准确度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号