首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
BackgroundSoftware fault prediction is the process of developing models that can be used by the software practitioners in the early phases of software development life cycle for detecting faulty constructs such as modules or classes. There are various machine learning techniques used in the past for predicting faults.MethodIn this study we perform a systematic review of studies from January 1991 to October 2013 in the literature that use the machine learning techniques for software fault prediction. We assess the performance capability of the machine learning techniques in existing research for software fault prediction. We also compare the performance of the machine learning techniques with the statistical techniques and other machine learning techniques. Further the strengths and weaknesses of machine learning techniques are summarized.ResultsIn this paper we have identified 64 primary studies and seven categories of the machine learning techniques. The results prove the prediction capability of the machine learning techniques for classifying module/class as fault prone or not fault prone. The models using the machine learning techniques for estimating software fault proneness outperform the traditional statistical models.ConclusionBased on the results obtained from the systematic review, we conclude that the machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers. However, the application of the machine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results. We provide future guidelines to practitioners and researchers based on the results obtained in this work.  相似文献   

2.
Software quality engineering comprises of several quality assurance activities such as testing, formal verification, inspection, fault tolerance, and software fault prediction. Until now, many researchers developed and validated several fault prediction models by using machine learning and statistical techniques. There have been used different kinds of software metrics and diverse feature reduction techniques in order to improve the models’ performance. However, these studies did not investigate the effect of dataset size, metrics set, and feature selection techniques for software fault prediction. This study is focused on the high-performance fault predictors based on machine learning such as Random Forests and the algorithms based on a new computational intelligence approach called Artificial Immune Systems. We used public NASA datasets from the PROMISE repository to make our predictive models repeatable, refutable, and verifiable. The research questions were based on the effects of dataset size, metrics set, and feature selection techniques. In order to answer these questions, there were defined seven test groups. Additionally, nine classifiers were examined for each of the five public NASA datasets. According to this study, Random Forests provides the best prediction performance for large datasets and Naive Bayes is the best prediction algorithm for small datasets in terms of the Area Under Receiver Operating Characteristics Curve (AUC) evaluation parameter. The parallel implementation of Artificial Immune Recognition Systems (AIRS2Parallel) algorithm is the best Artificial Immune Systems paradigm-based algorithm when the method-level metrics are used.  相似文献   

3.
Empirical validation of software metrics used to predict software quality attributes is important to ensure their practical relevance in software organizations. The aim of this work is to find the relation of object-oriented (OO) metrics with fault proneness at different severity levels of faults. For this purpose, different prediction models have been developed using regression and machine learning methods. We evaluate and compare the performance of these methods to find which method performs better at different severity levels of faults and empirically validate OO metrics given by Chidamber and Kemerer. The results of the empirical study are based on public domain NASA data set. The performance of the predicted models was evaluated using Receiver Operating Characteristic (ROC) analysis. The results show that the area under the curve (measured from the ROC analysis) of models predicted using high severity faults is low as compared with the area under the curve of the model predicted with respect to medium and low severity faults. However, the number of faults in the classes correctly classified by predicted models with respect to high severity faults is not low. This study also shows that the performance of machine learning methods is better than logistic regression method with respect to all the severities of faults. Based on the results, it is reasonable to claim that models targeted at different severity levels of faults could help for planning and executing testing by focusing resources on fault-prone parts of the design and code that are likely to cause serious failures.  相似文献   

4.
Nowadays, autism spectrum disorder (ASD) is one of the fastest-growing developmental disorders globally. Screening test consumes more time and expensive to detect the autism signs. Due to advances in artificial intelligence and machine learning (ML), Autism can be predicted at the initial stage. Numerous researches have been conducted using different methods, but none of these researches presented any anticipated results about the capability to predict autism traits under different age groups. Therefore, this paper proposes an effectual prediction method based upon ML strategy to develop a mobile application for predicting ASD of any age people. The autism prediction model is developed by five ML classifiers, such as Gaussian Naive Bayes, Decision Tree Classifier, K-Nearest Neighbors (KNN), Multinomial Logistic Regression (MLR), Support Vector Machine (SVM) and also a mobile application is developed using proposed prediction method. The proposed method is analyzed with IAPQ records gathered from certain areas in Erode district, Tamil Nadu, India. From the analysis, the SVM classifier achieves maximum sensitivity of 23.14%, 6.04%, 5.89%, 11.03% than other classifiers, like Gaussian Naive Bayes, Decision Tree Classifier, KNN, and MLR.  相似文献   

5.
提出一种基于模糊决策树的软件成本估计模型。该模型能够把决策树的归纳学习能力和模糊集合所具有的表达能力相结合,在软件成本估计过程中以模糊集的形式预测相对误差范围,并利用决策树的判定规则分析误差源。最后,通过软件项目历史数据验证该软件成本估计模型的有效性。  相似文献   

6.
Fault Prediction is the most required measure to estimate the software quality and reliability. Several methods, measures, aspects and testing methodologies are available to evaluate the software fault. In this paper, a fuzzy-filtered neuro-fuzzy framework is introduced to predict the software faults for internal and external software projects. The suggested framework is split into three primary phases. At the earlier phase, the effective metrics or measures are identified, which can derive the accurate decision on prediction of software fault. In this phase, the composite analytical observation of each software attribute is calculated using Information Gain and Gain Ratio measures. In the second phase, these fuzzy rules are applied on these measures for selection of effective and high-impact features. In the last phase, the Neuro-fuzzy classifier is applied on fuzzy-filtered training and testing sets. The proposed framework is applied to identify the software faults based on inter-version and inter-project evaluation. In this framework, the earlier projects or project-versions are considered as training sets and the new projects or versions are taken as testing sets. The experimentation is conducted on nine open source projects taken from PROMISE repository as well as on PDE and JDT projects. The approximation is applied on internal version-specific fault prediction and external software projects evaluation. The comparative analysis is performed against Decision Tree, Random Tree, Random Forest, Naive Bayes and Multilevel Perceptron classifiers. This prediction result signifies that the proposed framework has gained the higher accuracy, lesser error rate and significant AUC and GM for inter-project and inter-version evaluations.  相似文献   

7.
Analyzing software measurement data with clustering techniques   总被引:1,自引:0,他引:1  
For software quality estimation, software development practitioners typically construct quality-classification or fault prediction models using software metrics and fault data from a previous system release or a similar software project. Engineers then use these models to predict the fault proneness of software modules in development. Software quality estimation using supervised-learning approaches is difficult without software fault measurement data from similar projects or earlier system releases. Cluster analysis with expert input is a viable unsupervised-learning solution for predicting software modules' fault proneness and potential noisy modules. Data analysts and software engineering experts can collaborate more closely to construct and collect more informative software metrics.  相似文献   

8.
基于邻域粗糙集的属性约简算法在进行属性约简时只考虑单一属性对决策属性的影响,未能考虑各属性间的相关性,针对这个问题,提出了一种基于卡方检验的邻域粗糙集属性约简算法(ChiS-NRS)。首先,利用卡方检验计算相关性,在筛选重要属性时考虑相关属性之间的影响,在降低时间复杂度的同时提高了分类准确率;然后,将改进的算法与梯度提升决策树(GBDT)算法组合以建立分类模型,并在UCI数据集上对模型进行验证;最后,将该模型应用于预测肝癌微血管侵犯的发生。实验结果表明,与未约简、邻域粗糙集约简等几种约简算法相比,改进算法在一些UCI数据集上的分类准确率最高;在肝癌微血管侵犯预测中,与卷积神经网络(CNN)、支持向量机(SVM)、随机森林(RF)等预测模型相比,提出的模型在测试集上的预测准确率达到了88.13%,其灵敏度、特异度和受试者操作曲线(ROC)的曲线下面积(AUC)分别为87.10%、89.29%和0.90,各指标都达到了最好。因此,所提模型能更好地预测肝癌微血管侵犯的发生,能辅助医生进行更精确的诊断。  相似文献   

9.
Predicting defect-prone software modules using support vector machines   总被引:2,自引:0,他引:2  
Effective prediction of defect-prone software modules can enable software developers to focus quality assurance activities and allocate effort and resources more efficiently. Support vector machines (SVM) have been successfully applied for solving both classification and regression problems in many applications. This paper evaluates the capability of SVM in predicting defect-prone software modules and compares its prediction performance against eight statistical and machine learning models in the context of four NASA datasets. The results indicate that the prediction performance of SVM is generally better than, or at least, is competitive against the compared models.  相似文献   

10.
System analysts often use software fault prediction models to identify fault-prone modules during the design phase of the software development life cycle. The models help predict faulty modules based on the software metrics that are input to the models. In this study, we consider 20 types of metrics to develop a model using an extreme learning machine associated with various kernel methods. We evaluate the effectiveness of the mode using a proposed framework based on the cost and efficiency in the testing phases. The evaluation process is carried out by considering case studies for 30 object-oriented software systems. Experimental results demonstrate that the application of a fault prediction model is suitable for projects with the percentage of faulty classes below a certain threshold, which depends on the efficiency of fault identification (low: 47.28%; median: 39.24%; high: 25.72%). We consider nine feature selection techniques to remove the irrelevant metrics and to select the best set of source code metrics for fault prediction.  相似文献   

11.
This paper provides a systematic review of previous software fault prediction studies with a specific focus on metrics, methods, and datasets. The review uses 74 software fault prediction papers in 11 journals and several conference proceedings. According to the review results, the usage percentage of public datasets increased significantly and the usage percentage of machine learning algorithms increased slightly since 2005. In addition, method-level metrics are still the most dominant metrics in fault prediction research area and machine learning algorithms are still the most popular methods for fault prediction. Researchers working on software fault prediction area should continue to use public datasets and machine learning algorithms to build better fault predictors. The usage percentage of class-level is beyond acceptable levels and they should be used much more than they are now in order to predict the faults earlier in design phase of software life cycle.  相似文献   

12.
当软件历史仓库中有标记训练样本较少时,有效的预测模型难以构建.针对此问题,文中提出基于二次学习的半监督字典学习软件缺陷预测方法.在第一阶段的学习中,利用稀疏表示分类器将大量无标记样本通过概率软标记标注扩充至有标记训练样本集中.再在扩充后的训练集上进行第二阶段的鉴别字典学习,最后在学得的字典上预测缺陷倾向性.在NASA MDP和PROMISE AR数据集上的实验验证文中方法的优越性.  相似文献   

13.
网络故障管理旨在检测、识别和纠正网络中发生的错误状况,为用户获得可靠稳定的网络服务提供保障,近年来,如何利用机器学习方法进行蜂窝网络故障管理引起了广泛关注。首先介绍了蜂窝网络故障管理的研究背景,明确网络故障管理的流程和功能;接着介绍现有蜂窝网络故障管理框架;随后对现有机器学习在蜂窝网络故障管理中的方法研究进行评述,从故障管理周期入手,分别对实现故障检测、故障诊断以及故障预测的机器学习方法展开介绍、总结和对比分析,为相关领域的研究提供参考。  相似文献   

14.
A number of papers have investigated the relationships between design metrics and the detection of faults in object-oriented software. Several of these studies have shown that such models can be accurate in predicting faulty classes within one particular software product. In practice, however, prediction models are built on certain products to be used on subsequent software development projects. How accurate can these models be, considering the inevitable differences that may exist across projects and systems? Organizations typically learn and change. From a more general standpoint, can we obtain any evidence that such models are economically viable tools to focus validation and verification effort? This paper attempts to answer these questions by devising a general but tailorable cost-benefit model and by using fault and design data collected on two mid-size Java systems developed in the same environment. Another contribution of the paper is the use of a novel exploratory analysis technique - MARS (multivariate adaptive regression splines) to build such fault-proneness models, whose functional form is a-priori unknown. The results indicate that a model built on one system can be accurately used to rank classes within another system according to their fault proneness. The downside, however, is that, because of system differences, the predicted fault probabilities are not representative of the system predicted. However, our cost-benefit model demonstrates that the MARS fault-proneness model is potentially viable, from an economical standpoint. The linear model is not nearly as good, thus suggesting a more complex model is required.  相似文献   

15.
Predicting the fault-proneness labels of software program modules is an emerging software quality assurance activity and the quality of datasets collected from previous software version affects the performance of fault prediction models. In this paper, we propose an outlier detection approach using metrics thresholds and class labels to identify class outliers. We evaluate our approach on public NASA datasets from PROMISE repository. Experiments reveal that this novel outlier detection method improves the performance of robust software fault prediction models based on Naive Bayes and Random Forests machine learning algorithms.  相似文献   

16.
ContextIn software industry, project managers usually rely on their previous experience to estimate the number men/hours required for each software project. The accuracy of such estimates is a key factor for the efficient application of human resources. Machine learning techniques such as radial basis function (RBF) neural networks, multi-layer perceptron (MLP) neural networks, support vector regression (SVR), bagging predictors and regression-based trees have recently been applied for estimating software development effort. Some works have demonstrated that the level of accuracy in software effort estimates strongly depends on the values of the parameters of these methods. In addition, it has been shown that the selection of the input features may also have an important influence on estimation accuracy.ObjectiveThis paper proposes and investigates the use of a genetic algorithm method for simultaneously (1) select an optimal input feature subset and (2) optimize the parameters of machine learning methods, aiming at a higher accuracy level for the software effort estimates.MethodSimulations are carried out using six benchmark data sets of software projects, namely, Desharnais, NASA, COCOMO, Albrecht, Kemerer and Koten and Gray. The results are compared to those obtained by methods proposed in the literature using neural networks, support vector machines, multiple additive regression trees, bagging, and Bayesian statistical models.ResultsIn all data sets, the simulations have shown that the proposed GA-based method was able to improve the performance of the machine learning methods. The simulations have also demonstrated that the proposed method outperforms some recent methods reported in the recent literature for software effort estimation. Furthermore, the use of GA for feature selection considerably reduced the number of input features for five of the data sets used in our analysis.ConclusionsThe combination of input features selection and parameters optimization of machine learning methods improves the accuracy of software development effort. In addition, this reduces model complexity, which may help understanding the relevance of each input feature. Therefore, some input parameters can be ignored without loss of accuracy in the estimations.  相似文献   

17.
This paper introduces two neural network based software fault prediction models using Object-Oriented metrics. They are empirically validated using a data set collected from the software modules developed by the graduate students of our academic institution. The results are compared with two statistical models using five quality attributes and found that neural networks do better. Among the two neural networks, Probabilistic Neural Networks outperform in predicting the fault proneness of the Object-Oriented modules developed.  相似文献   

18.
This paper presents an approach to predict the operating conditions of machine based on classification and regression trees (CART) and adaptive neuro-fuzzy inference system (ANFIS) in association with direct prediction strategy for multi-step ahead prediction of time series techniques. In this study, the number of available observations and the number of predicted steps are initially determined by using false nearest neighbor method and auto mutual information technique, respectively. These values are subsequently utilized as inputs for prediction models to forecast the future values of the machines’ operating conditions. The performance of the proposed approach is then evaluated by using real trending data of low methane compressor. A comparative study of the predicted results obtained from CART and ANFIS models is also carried out to appraise the prediction capability of these models. The results show that the ANFIS prediction model can track the change in machine conditions and has the potential for using as a tool to machine fault prognosis.  相似文献   

19.
To get a better prediction of costs, schedule, and the risks of a software project, it is necessary to have a more accurate prediction of its development effort. Among the main prediction techniques are those based on mathematical models, such as statistical regressions or machine learning (ML). The ML models applied to predicting the development effort have mainly based their conclusions on the following weaknesses: (1) using an accuracy criterion which leads to asymmetry, (2) applying a validation method that causes a conclusion instability by randomly selecting the samples for training and testing the models, (3) omitting the explanation of how the parameters for the neural networks were determined, (4) generating conclusions from models that were not trained and tested from mutually exclusive data sets, (5) omitting an analysis of the dependence, variance and normality of data for selecting the suitable statistical test for comparing the accuracies among models, and (6) reporting results without showing a statistically significant difference. In this study, these six issues are addressed when comparing the prediction accuracy of a radial Basis Function Neural Network (RBFNN) with that of a regression statistical (the model most frequently compared with ML models), to feedforward multilayer perceptron (MLP, the most commonly used in the effort prediction of software projects), and to general regression neural network (GRNN, a RBFNN variant). The hypothesis tested is the following: the accuracy of effort prediction for RBFNN is statistically better than the accuracy obtained from a simple linear regression (SLR), MLP and GRNN when adjusted function points data, obtained from software projects, is used as the independent variable. Samples obtained from the International Software Benchmarking Standards Group (ISBSG) Release 11 related to new and enhanced projects were used. The models were trained and tested from a leave-one-out cross-validation method. The criteria for evaluating the models were based on Absolute Residuals and by a Friedman statistical test. The results showed that there was a statistically significant difference in the accuracy among the four models for new projects, but not for enhanced projects. Regarding new projects, the accuracy for RBFNN was better than for a SLR at the 99% confidence level, whereas the MLP and GRNN were better than for a SLR at the 90% confidence level.  相似文献   

20.
The prediction accuracy and generalization ability of neural/neurofuzzy models for chaotic time series prediction highly depends on employed network model as well as learning algorithm. In this study, several neural and neurofuzzy models with different learning algorithms are examined for prediction of several benchmark chaotic systems and time series. The prediction performance of locally linear neurofuzzy models with recently developed Locally Linear Model Tree (LoLiMoT) learning algorithm is compared with that of Radial Basis Function (RBF) neural network with Orthogonal Least Squares (OLS) learning algorithm, MultiLayer Perceptron neural network with error back-propagation learning algorithm, and Adaptive Network based Fuzzy Inference System. Particularly, cross validation techniques based on the evaluation of error indices on multiple validation sets is utilized to optimize the number of neurons and to prevent over fitting in the incremental learning algorithms. To make a fair comparison between neural and neurofuzzy models, they are compared at their best structure based on their prediction accuracy, generalization, and computational complexity. The experiments are basically designed to analyze the generalization capability and accuracy of the learning techniques when dealing with limited number of training samples from deterministic chaotic time series, but the effect of noise on the performance of the techniques is also considered. Various chaotic systems and time series including Lorenz system, Mackey-Glass chaotic equation, Henon map, AE geomagnetic activity index, and sunspot numbers are examined as case studies. The obtained results indicate the superior performance of incremental learning algorithms and their respective networks, such as, OLS for RBF network and LoLiMoT for locally linear neurofuzzy model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号