共查询到20条相似文献,搜索用时 15 毫秒
1.
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques 总被引:6,自引:0,他引:6
High-assurance and complex mission-critical software systems are heavily dependent on reliability of their underlying software applications. An early software fault prediction is a proven technique in achieving high software reliability. Prediction models based on software metrics can predict number of faults in software modules. Timely predictions of such models can be used to direct cost-effective quality enhancement efforts to modules that are likely to have a high number of faults. We evaluate the predictive performance of six commonly used fault prediction techniques: CART-LS (least squares), CART-LAD (least absolute deviation), S-PLUS, multiple linear regression, artificial neural networks, and case-based reasoning. The case study consists of software metrics collected over four releases of a very large telecommunications system. Performance metrics, average absolute and average relative errors, are utilized to gauge the accuracy of different prediction models. Models were built using both, original software metrics (RAW) and their principle components (PCA). Two-way ANOVA randomized-complete block design models with two blocking variables are designed with average absolute and average relative errors as response variables. System release and the model type (RAW or PCA) form the blocking variables and the prediction technique is treated as a factor. Using multiple-pairwise comparisons, the performance order of prediction models is determined. We observe that for both average absolute and average relative errors, the CART-LAD model performs the best while the S-PLUS model is ranked sixth. 相似文献
2.
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study 总被引:4,自引:0,他引:4
Software metrics-based quality classification models predict a software module as either fault-prone (fp) or not fault-prone (nfp). Timely application of such models can assist in directing quality improvement efforts to modules that are likely to be fp during operations, thereby cost-effectively utilizing the software quality testing and enhancement resources. Since several classification techniques are available, a relative comparative study of some commonly used classification techniques can be useful to practitioners. We present a comprehensive evaluation of the relative performances of seven classification techniques and/or tools. These include logistic regression, case-based reasoning, classification and regression trees (CART), tree-based classification with S-PLUS, and the Sprint-Sliq, C4.5, and Treedisc algorithms. The use of expected cost of misclassification (ECM), is introduced as a singular unified measure to compare the performances of different software quality classification models. A function of the costs of the Type I (a nfp module misclassified as fp) and Type II (a fp module misclassified as nfp) misclassifications, ECM is computed for different cost ratios. Evaluating software quality classification models in the presence of varying cost ratios is important, because the usefulness of a model is dependent on the system-specific costs of misclassifications. Moreover, models should be compared and preferred for cost ratios that fall within the range of interest for the given system and project domain. Software metrics were collected from four successive releases of a large legacy telecommunications system. A two-way ANOVA randomized-complete block design modeling approach is used, in which the system release is treated as a block, while the modeling method is treated as a factor. It is observed that predictive performances of the models is significantly different across the system releases, implying that in the software engineering domain prediction models are influenced by the characteristics of the data and the system being modeled. Multiple-pairwise comparisons are performed to evaluate the relative performances of the seven models for the cost ratios of interest to the case study. In addition, the performance of the seven classification techniques is also compared with a classification based on lines of code. The comparative approach presented in this paper can also be applied to other software systems. 相似文献
3.
The primary aim of risk-based software quality classification models is to detect, prior to testing or operations, components that are most-likely to be of high-risk. Their practical usage as quality assurance tools is gauged by the prediction-accuracy and cost-effective aspects of the models. Classifying modules into two risk groups is the more commonly practiced trend. Such models assume that all modules predicted as high-risk will be subjected to quality improvements. Due to the always-limited reliability improvement resources and the variability of the quality risk-factor, a more focused classification model may be desired to achieve cost-effective software quality assurance goals. In such cases, calibrating a three-group (high-risk, medium-risk, and low-risk) classification model is more rewarding. We present an innovative method that circumvents the complexities, computational overhead, and difficulties involved in calibrating pure or direct three-group classification models. With the application of the proposed method, practitioners can utilize an existing two-group classification algorithm thrice in order to yield the three risk-based classes. An empirical approach is taken to investigate the effectiveness and validity of the proposed technique. Some commonly used classification techniques are studied to demonstrate the proposed methodology. They include, the C4.5 decision tree algorithm, discriminant analysis, and case-based reasoning. For the first two, we compare the three-group model calibrated using the respective techniques with the one built by applying the proposed method. Any two-group classification technique can be employed by the proposed method, including those that do not provide a direct three-group classification model, e.x., logistic regression and certain binary classification trees, such as CART. Based on a case study of a large-scale industrial software system, it is observed that the proposed method yielded promising results. For a given classification technique, the expected cost of misclassification of the proposed three-group models were significantly better (generally) when compared to the techniques direct three-group model. In addition, the proposed method is also evaluated against an alternate indirect three-group classification method. 相似文献
4.
Taghi M. Khoshgoftaar Xiaojing Yuan Edward B. Allen Wendell D. Jones John P. Hudepohl 《Empirical Software Engineering》2002,7(4):297-318
Many development organizations try to minimize faults in software as a means for improving customer satisfaction. Assuring high software quality often entails time-consuming and costly development processes. A software quality model based on software metrics can be used to guide enhancement efforts by predicting which modules are fault-prone. This paper presents statistical techniques to determine which predictions by a classification tree should be considered uncertain. We conducted a case study of a large legacy telecommunications system. One release was the basis for the training dataset, and the subsequent release was the basis for the evaluation dataset. We built a classification tree using the TREEDISC algorithm, which is based on 2 tests of contingency tables. The model predicted whether a module was likely to have faults discovered by customers, or not, based on software product, process, and execution metrics. We simulated practical use of the model by classifying the modules in the evaluation dataset. The model achieved useful accuracy, in spite of the very small proportion of fault-prone modules in the system. We assessed whether the classes assigned to the leaves were appropriate by statistical tests, and found sizable subsets of modules with uncertain classification. Discovering which modules have uncertain classifications allows sophisticated enhancement strategies to resolve uncertainties. Moreover, TREEDISC is especially well suited to identifying uncertain classifications. 相似文献
5.
Classification techniques for metric-based software development 总被引:1,自引:0,他引:1
Christof Ebert 《Software Quality Journal》1996,5(4):255-272
Managing software development and maintenance projects requires predictions about components of the software system that are likely to have a high error rate or that need high development effort. The value of any classification is determined by the accuracy and cost of such predictions. The paper investigates the hypothesis whether fuzzy classification applied to criticality prediction provides better results than other classification techniques that have been introduced in this area. Five techniques for identifying error-prone software components are compared, namely Pareto classification, crisp classification trees, factor-based discriminant analysis, neural networks, and fuzzy classification. The comparison is illustrated with experimental results from the development of industrial real-time projects. A module quality model — with respect to changes — provides both quality of fit (according to past data) and predictive accuracy (according to ongoing projects). Fuzzy classification showed best results in terms of overall predictive accuracy. 相似文献
6.
7.
Software quality models can predict the quality of modules early enough for cost-effective prevention of problems. For example, software product and process metrics can be the basis for predicting reliability. Predicting the exact number of faults is often not necessary; classification models can identify fault-prone modules. However, such models require that fault-prone be defined before modeling, usually via a threshold. This may not be practical due to uncertain limits on the amount of reliability-improvement effort. In such cases, predicting the rank-order of modules is more useful.A module-order model predicts the rank-order of modules according to a quantitative quality factor, such as the number of faults. This paper demonstrates how module-order models can be used for classification, and compares them with statistical classification models.Two case studies of full-scale industrial software systems compared nonparametric discriminant analysis with module-order models. One case study examined a military command, control, and communications system. The other studied a large legacy telecommunications system. We found that module-order models give management more flexible reliability enhancement strategies than classification models, and in these case studies, yielded more accurate results than corresponding discriminant models. 相似文献
8.
Conventional approaches to software cost estimation have focused on algorithmic cost models, where an estimate of effort is calculated from one or more numerical inputs via a mathematical model. Analogy-based estimation has recently emerged as a promising approach, with comparable accuracy to algorithmic methods in some studies, and it is potentially easier to understand and apply. The current study compares several methods of analogy-based software effort estimation with each other and also with a simple linear regression model. The results show that people are better than tools at selecting analogues for the data set used in this study. Estimates based on their selections, with a linear size adjustment to the analogue's effort value, proved more accurate than estimates based on analogues selected by tools, and also more accurate than estimates based on the simple regression model. 相似文献
9.
10.
When building software quality models, the approach often consists of training data mining learners on a single fit dataset.
Typically, this fit dataset contains software metrics collected during a past release of the software project that we want
to predict the quality of. In order to improve the predictive accuracy of such quality models, it is common practice to combine
the predictive results of multiple learners to take advantage of their respective biases. Although multi-learner classifiers
have been proven to be successful in some cases, the improvement is not always significant because the information in the
fit dataset sometimes can be insufficient. We present an innovative method to build software quality models using majority
voting to combine the predictions of multiple learners induced on multiple training datasets. To our knowledge, no previous
study in software quality has attempted to take advantage of multiple software project data repositories which are generally
spread across the organization. In a large scale empirical study involving seven real-world datasets and seventeen learners,
we show that, on average, combining the predictions of one learner trained on multiple datasets significantly improves the
predictive performance compared to one learner induced on a single fit dataset. We also demonstrate empirically that combining
multiple learners trained on a single training dataset does not significantly improve the average predictive accuracy compared
to the use of a single learner induced on a single fit dataset.
Taghi M. Khoshgoftaar is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the Empirical Software Engineering and Data Mining and Machine Learning Laboratories. His research interests are in software engineering, software metrics, software reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, machine learning, and statistical modeling. He has published more than 350 refereed papers in these areas. He is a member of the IEEE, IEEE Computer Society, and IEEE Reliability Society. He was the program chair and general Chair of the IEEE International Conference on Tools with Artificial Intelligence in 2004 and 2005 respectively and is the Program chair of the 20th International Conference on Software Engineering and Knowledge Engineering (2008). He has served on technical program committees of various international conferences, symposia, and workshops. Also, he has served as North American Editor of the Software Quality Journal, and is on the editorial boards of the journals Software Quality and Fuzzy systems. Pierre Rebours received the M.S. degree in Computer Engineering “from Florida Atlantic University, Boca Raton, FL, USA, in April, 2004.” His research interests include quality of data and data mining. Naeem Seliya is an Assistant Professor of Computer and Information Science at the University of Michigan-Dearborn. He received his Ph.D. in Computer Engineering from Florida Atlantic University, Boca Raton, FL, USA in 2005. His research interests include software engineering, data mining and machine learning, software measurement, software reliability and quality engineering, software architecture, computer data security, and network intrusion detection. He is a member of the IEEE and the Association for Computing Machinery. 相似文献
Naeem SeliyaEmail: |
Taghi M. Khoshgoftaar is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the Empirical Software Engineering and Data Mining and Machine Learning Laboratories. His research interests are in software engineering, software metrics, software reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, machine learning, and statistical modeling. He has published more than 350 refereed papers in these areas. He is a member of the IEEE, IEEE Computer Society, and IEEE Reliability Society. He was the program chair and general Chair of the IEEE International Conference on Tools with Artificial Intelligence in 2004 and 2005 respectively and is the Program chair of the 20th International Conference on Software Engineering and Knowledge Engineering (2008). He has served on technical program committees of various international conferences, symposia, and workshops. Also, he has served as North American Editor of the Software Quality Journal, and is on the editorial boards of the journals Software Quality and Fuzzy systems. Pierre Rebours received the M.S. degree in Computer Engineering “from Florida Atlantic University, Boca Raton, FL, USA, in April, 2004.” His research interests include quality of data and data mining. Naeem Seliya is an Assistant Professor of Computer and Information Science at the University of Michigan-Dearborn. He received his Ph.D. in Computer Engineering from Florida Atlantic University, Boca Raton, FL, USA in 2005. His research interests include software engineering, data mining and machine learning, software measurement, software reliability and quality engineering, software architecture, computer data security, and network intrusion detection. He is a member of the IEEE and the Association for Computing Machinery. 相似文献
11.
Accuracy of machine learners is affected by quality of the data the learners are induced on. In this paper, quality of the training dataset is improved by removing instances detected as noisy by the Partitioning Filter. The fit dataset is first split into subsets, and different base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. Two versions of the Partitioning Filter are used: Multiple-Partitioning Filter and Iterative-Partitioning Filter. The number of instances removed by the filters is tuned by the voting scheme of the filter and the number of iterations. The primary aim of this study is to compare the predictive performances of the final models built on the filtered and the un-filtered training datasets. A case study of software measurement data of a high assurance software project is performed. It is shown that predictive performances of models built on the filtered fit datasets and evaluated on a noisy test dataset are generally better than those built on the noisy (un-filtered) fit dataset. However, predictive performance based on certain aggressive filters is affected by presence of noise in the evaluation dataset. 相似文献
12.
为了解决软件过程数据因活动信息及案例属性的缺失而无法应用传统过程挖掘方法的问题,以软件过程数据为研究对象,提出了一种双层次的软件过程挖掘方法.在活动层,提出加权结构连接向量模型对过程日志进行向量化,通过平均活动熵来确定过程日志模糊聚类的结果,将聚类结果作为活动信息支持后续挖掘工作的开展;在过程层,以启发式关系度量为基础,针对非完全循环进行研究,提出了过程层单触发序列循环划分的日志完备性条件,并进一步给出了循环归属的度量方法.基于大量真实软件过程数据的实验结果表明了双层次的软件过程挖掘方法的可行性及正确性. 相似文献
13.
Software is quite often expensive to develop and can become a major cost factor in corporate information systems budgets. With the variability of software characteristics and the continual emergence of new technologies the accurate prediction of software development costs is a critical problem within the project management context.
In order to address this issue a large number of software cost prediction models have been proposed. Each model succeeds to some extent but they all encounter the same problem, i.e., the inconsistency and inadequacy of the historical data sets. Often a preliminary data analysis has not been performed and it is possible for the data to contain non-dominated or confounded variables. Moreover, some of the project attributes or their values are inappropriately out of date, for example the type of computer used for project development in the COCOMO 81 (Boehm, 1981) data set.
This paper proposes a framework composed of a set of clearly identified steps that should be performed before a data set is used within a cost estimation model. This framework is based closely on a paradigm proposed by Maxwell (2002). Briefly, the framework applies a set of statistical approaches, that includes correlation coefficient analysis, Analysis of Variance and Chi-Square test, etc., to the data set in order to remove outliers and identify dominant variables.
To ground the framework within a practical context the procedure is used to analyze the ISBSG (International Software Benchmarking Standards Group data—Release 8) data set. This is a frequently used accessible data collection containing information for 2,008 software projects. As a consequence of this analysis, 6 explanatory variables are extracted and evaluated. 相似文献
14.
Emilia Mendes Ian Watson Chris Triggs Nile Mosley Steve Counsell 《Empirical Software Engineering》2003,8(2):163-196
Software cost models and effort estimates help project managers allocate resources, control costs and schedule and improve current practices, leading to projects finished on time and within budget. In the context of Web development, these issues are also crucial, and very challenging given that Web projects have short schedules and very fluidic scope. In the context of Web engineering, few studies have compared the accuracy of different types of cost estimation techniques with emphasis placed on linear and stepwise regressions, and case-based reasoning (CBR). To date only one type of CBR technique has been employed in Web engineering. We believe results obtained from that study may have been biased, given that other CBR techniques can also be used for effort prediction. Consequently, the first objective of this study is to compare the prediction accuracy of three CBR techniques to estimate the effort to develop Web hypermedia applications and to choose the one with the best estimates. The second objective is to compare the prediction accuracy of the best CBR technique against two commonly used prediction models, namely stepwise regression and regression trees. One dataset was used in the estimation process and the results showed that the best predictions were obtained for stepwise regression. 相似文献
15.
A Simulation Tool for Efficient Analogy Based Cost Estimation 总被引:1,自引:0,他引:1
Estimation of a software project effort, based on project analogies, is a promising method in the area of software cost estimation. Projects in a historical database, that are analogous (similar) to the project under examination, are detected, and their effort data are used to produce estimates. As in all software cost estimation approaches, important decisions must be made regarding certain parameters, in order to calibrate with local data and obtain reliable estimates. In this paper, we present a statistical simulation tool, namely the bootstrap method, which helps the user in tuning the analogy approach before application to real projects. This is an essential step of the method, because if inappropriate values for the parameters are selected in the first place, the estimate will be inevitably wrong. Additionally, we show how measures of accuracy and in particular, confidence intervals, may be computed for the analogy-based estimates, using the bootstrap method with different assumptions about the population distribution of the data set. Estimate confidence intervals are necessary in order to assess point estimate accuracy and assist risk analysis and project planning. Examples of bootstrap confidence intervals and a comparison with regression models are presented on well-known cost data sets. 相似文献
16.
17.
无适当使用软件测量将可能引起软件低品质且高成本的窘态.凝聚力是软件品质重要因子之一如同维护度,可靠度和再利用度.软件模组品质的优劣必影响整体系统之品质的优劣.为了设计和维护高品质软件,软件专案经理人和软件工程师无可避免需引用软件凝聚力测量以衡量和产生高品质软件.提出以活路跃变量及视觉化变量纵距为分析基础之功能导向凝聚力测量方法.进而,以一系列实际案例来作实验验证,并以一组性质来作理论辩证所提的测量方法.因此一经完善定义,完善实验和完善辩证之凝聚力测量方法被提出用于当软件凝聚力强度的指标和因此增进软件品质.这凝聚力测量方法能容易嵌入CASE以帮助软件工程师确保软件品质. 相似文献
18.
19.
Taghi M. Khoshgoftaar Xiaojing Yuan Edward B. Allen 《Empirical Software Engineering》2000,5(4):313-330
Software product and process metrics can be useful predictorsof which modules are likely to have faults during operations.Developers and managers can use such predictions by softwarequality models to focus enhancement efforts before release.However, in practice, software quality modeling methods in theliterature may not produce a useful balance between the two kindsof misclassification rates, especially when there are few faultymodules.This paper presents a practical classificationrule in the context of classification tree models that allowsappropriate emphasis on each type of misclassification accordingto the needs of the project. This is especially important whenthe faulty modules are rare.An industrial case study using classification trees, illustrates the tradeoffs.The trees were built using the TREEDISC algorithm whichis a refinement of the CHAID algorithm. We examinedtwo releases of a very large telecommunications system, and builtmodels suited to two points in the development life cycle: theend of coding and the end of beta testing. Both trees had onlyfive significant predictors, out of 28 and 42 candidates, respectively.We interpreted the structure of the classification trees, andwe found the models had useful accuracy. 相似文献