期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Investigating the use of moving windows to improve software effort prediction: a replicated study

Chris Lokan Emilia Mendes 《Empirical Software Engineering》2017,22(2):716-767

To date most research in software effort estimation has not taken chronology into account when selecting projects for training and validation sets. A chronological split represents the use of a project’s starting and completion dates, such that any model that estimates effort for a new project p only uses as its training set projects that have been completed prior to p’s starting date. A study in 2009 (“S3”) investigated the use of chronological split taking into account a project’s age. The research question investigated was whether the use of a training set containing only the most recent past projects (a “moving window” of recent projects) would lead to more accurate estimates when compared to using the entire history of past projects completed prior to the starting date of a new project. S3 found that moving windows could improve the accuracy of estimates. The study described herein replicates S3 using three different and independent data sets. Estimation models were built using regression, and accuracy was measured using absolute residuals. The results contradict S3, as they do not show any gain in estimation accuracy when using windows for effort estimation. This is a surprising result: the intuition that recent data should be more helpful than old data for effort estimation is not supported. Several factors, which are discussed in this paper, might have contributed to such contradicting results. Some of our future work entails replicating this work using other datasets, to understand better when using windows is a suitable choice for software companies. 相似文献

2.

GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation

Adriano L.I. Oliveira Petronio L. Braga Ricardo M.F. Lima Márcio L. Cornélio 《Information and Software Technology》2010,52(11):1155-1166

ContextIn software industry, project managers usually rely on their previous experience to estimate the number men/hours required for each software project. The accuracy of such estimates is a key factor for the efficient application of human resources. Machine learning techniques such as radial basis function (RBF) neural networks, multi-layer perceptron (MLP) neural networks, support vector regression (SVR), bagging predictors and regression-based trees have recently been applied for estimating software development effort. Some works have demonstrated that the level of accuracy in software effort estimates strongly depends on the values of the parameters of these methods. In addition, it has been shown that the selection of the input features may also have an important influence on estimation accuracy.ObjectiveThis paper proposes and investigates the use of a genetic algorithm method for simultaneously (1) select an optimal input feature subset and (2) optimize the parameters of machine learning methods, aiming at a higher accuracy level for the software effort estimates.MethodSimulations are carried out using six benchmark data sets of software projects, namely, Desharnais, NASA, COCOMO, Albrecht, Kemerer and Koten and Gray. The results are compared to those obtained by methods proposed in the literature using neural networks, support vector machines, multiple additive regression trees, bagging, and Bayesian statistical models.ResultsIn all data sets, the simulations have shown that the proposed GA-based method was able to improve the performance of the machine learning methods. The simulations have also demonstrated that the proposed method outperforms some recent methods reported in the recent literature for software effort estimation. Furthermore, the use of GA for feature selection considerably reduced the number of input features for five of the data sets used in our analysis.ConclusionsThe combination of input features selection and parameters optimization of machine learning methods improves the accuracy of software development effort. In addition, this reduces model complexity, which may help understanding the relevance of each input feature. Therefore, some input parameters can be ignored without loss of accuracy in the estimations. 相似文献

3.

Modeling development effort in object-oriented systems using designproperties

Briand L.C. Wust J. 《IEEE transactions on pattern analysis and machine intelligence》2001,27(11):963-986

In the context of software cost estimation, system size is widely taken as a main driver of system development effort. However, other structural design properties, such as coupling, cohesion, and complexity, have been suggested as additional cost factors. Using effort data from an object-oriented development project, we empirically investigate the relationship between class size and the development effort for a class and what additional impact structural properties such as class coupling have on effort. The paper proposes a practical, repeatable, and accurate analysis procedure to investigate relationships between structural properties and development effort. Results indicate that fairly accurate predictions of class effort can be made based on simple measures of the class interface size alone (mean MREs below 30 percent). Effort predictions at the system level are even more accurate as, using Bootstrapping, the estimated 95 percent confidence interval for MREs is 3 to 23 percent. But, more sophisticated coupling and cohesion measures do not help to improve these predictions to a degree that would be practically significant. However, the use of hybrid models combining Poisson regression and CART regression trees clearly improves the accuracy of the models as compared to using Poisson regression alone 相似文献

4.

A Comparative Study of Cost Estimation Models for Web Hypermedia Applications

Emilia Mendes Ian Watson Chris Triggs Nile Mosley Steve Counsell 《Empirical Software Engineering》2003,8(2):163-196

Software cost models and effort estimates help project managers allocate resources, control costs and schedule and improve current practices, leading to projects finished on time and within budget. In the context of Web development, these issues are also crucial, and very challenging given that Web projects have short schedules and very fluidic scope. In the context of Web engineering, few studies have compared the accuracy of different types of cost estimation techniques with emphasis placed on linear and stepwise regressions, and case-based reasoning (CBR). To date only one type of CBR technique has been employed in Web engineering. We believe results obtained from that study may have been biased, given that other CBR techniques can also be used for effort prediction. Consequently, the first objective of this study is to compare the prediction accuracy of three CBR techniques to estimate the effort to develop Web hypermedia applications and to choose the one with the best estimates. The second objective is to compare the prediction accuracy of the best CBR technique against two commonly used prediction models, namely stepwise regression and regression trees. One dataset was used in the estimation process and the results showed that the best predictions were obtained for stepwise regression. 相似文献

5.

Software development effort prediction of industrial projects applying a general regression neural network

Cuauhtemoc Lopez-Martin Claudia Isaza Arturo Chavoya 《Empirical Software Engineering》2012,17(6):738-756

An important factor for planning, budgeting and bidding a software project is prediction of the development effort required to complete it. This prediction can be obtained from models related to neural networks. The hypothesis of this research was the following: effort prediction accuracy of a general regression neural network (GRNN) model is statistically equal or better than that obtained by a statistical regression model, using data obtained from industrial environments. Each model was generated from a separate dataset obtained from the International Software Benchmarking Standards Group (ISBSG) software projects repository. Each of the two models was then validated using a new dataset from the same ISBSG repository. Results obtained from a variance analysis of accuracies of the models suggest that a GRNN could be an alternative for predicting development effort of software projects that have been developed in industrial environments. 相似文献

6.

Estimation and prediction metrics for adaptive maintenance effortof object-oriented systems

Fioravanti F. Nesi P. 《IEEE transactions on pattern analysis and machine intelligence》2001,27(12):1062-1084

Many software systems built in recent years have been developed using object-oriented technology and, in some cases, they already need adaptive maintenance in order to satisfy market and customer needs. In most cases, the estimation and prediction of maintenance effort is performed with difficulty due to the lack of metrics and suitable models. In this paper, a model and metrics for estimation/prediction of adaptive maintenance effort are presented and compared with some other solutions taken from the literature. The model proposed can be used as a general approach for adopting well-known metrics (typically used for the estimation of development effort) for the estimation/prediction of adaptive maintenance effort. The model and metrics proposed have been validated against real data by using multilinear regression analysis. The validation has shown that several well-known metrics can be profitably employed for the estimation/prediction of maintenance effort 相似文献

7.

Source code size estimation approaches for object-oriented systems from UML class diagrams: A comparative study

《Information and Software Technology》2014,56(2):220-237

BackgroundSource code size in terms of SLOC (source lines of code) is the input of many parametric software effort estimation models. However, it is unavailable at the early phase of software development.ObjectiveWe investigate the accuracy of early SLOC estimation approaches for an object-oriented system using the information collected from its UML class diagram available at the early software development phase.MethodWe use different modeling techniques to build the prediction models for investigating the accuracy of six types of metrics to estimate SLOC. The used techniques include linear models, non-linear models, rule/tree-based models, and instance-based models. The investigated metrics are class diagram metrics, predictive object points, object-oriented project size metric, fast&&serious class points, objective class points, and object-oriented function points.ResultsBased on 100 open-source Java systems, we find that the prediction model built using object-oriented project size metric and ordinary least square regression with a logarithmic transformation achieves the highest accuracy (mean MMRE = 0.19 and mean Pred(25) = 0.74).ConclusionWe should use object-oriented project size metric and ordinary least square regression with a logarithmic transformation to build a simple, accurate, and comprehensible SLOC estimation model. 相似文献

8.

A PSO-based model to increase the accuracy of software development effort estimation

Vahid Khatibi Bardsiri Dayang Norhayati Abang Jawawi Siti Zaiton Mohd Hashim Elham Khatibi 《Software Quality Journal》2013,21(3):501-526

Development effort is one of the most important metrics that must be estimated in order to design the plan of a project. The uncertainty and complexity of software projects make the process of effort estimation difficult and ambiguous. Analogy-based estimation (ABE) is the most common method in this area because it is quite straightforward and practical, relying on comparison between new projects and completed projects to estimate the development effort. Despite many advantages, ABE is unable to produce accurate estimates when the importance level of project features is not the same or the relationship among features is difficult to determine. In such situations, efficient feature weighting can be a solution to improve the performance of ABE. This paper proposes a hybrid estimation model based on a combination of a particle swarm optimization (PSO) algorithm and ABE to increase the accuracy of software development effort estimation. This combination leads to accurate identification of projects that are similar, based on optimizing the performance of the similarity function in ABE. A framework is presented in which the appropriate weights are allocated to project features so that the most accurate estimates are achieved. The suggested model is flexible enough to be used in different datasets including categorical and non-categorical project features. Three real data sets are employed to evaluate the proposed model, and the results are compared with other estimation models. The promising results show that a combination of PSO and ABE could significantly improve the performance of existing estimation models. 相似文献

9.

An Artificial Neural Network-Based Model for Effective Software Development Effort Estimation

Junaid Rashid Sumera Kanwal Muhammad Wasif Nisar Jungeun Kim Amir Hussain 《计算机系统科学与工程》2023,44(2):1309-1321

In project management, effective cost estimation is one of the most crucial activities to efficiently manage resources by predicting the required cost to fulfill a given task. However, finding the best estimation results in software development is challenging. Thus, accurate estimation of software development efforts is always a concern for many companies. In this paper, we proposed a novel software development effort estimation model based both on constructive cost model II (COCOMO II) and the artificial neural network (ANN). An artificial neural network enhances the COCOMO model, and the value of the baseline effort constant A is calibrated to use it in the proposed model equation. Three state-of-the-art publicly available datasets are used for experiments. The backpropagation feedforward procedure used a training set by iteratively processing and training a neural network. The proposed model is tested on the test set. The estimated effort is compared with the actual effort value. Experimental results show that the effort estimated by the proposed model is very close to the real effort, thus enhanced the reliability and improving the software effort estimation accuracy. 相似文献

10.

Optimization of analogy weights by genetic algorithm for software effort estimation

《Information and Software Technology》2006,48(11):1034-1045

A reliable and accurate estimate of software development effort has always been a challenge for both the software industry and academia. Analogy is a widely adopted problem solving technique that has been evaluated and confirmed in software effort or cost estimation domains. Similarity measures between pairs of effort drivers play a central role in analogy-based estimation models. However, hardly any research has addressed the issue of how to decide on suitable weighted similarity measures for software effort drivers. The present paper investigates the effect on estimation accuracy of the adoption of genetic algorithm (GA) to determine the appropriate weighted similarity measures of effort drivers in analogy-based software effort estimation models. Three weighted analogy methods, namely, the unequally weighted, the linearly weighted and the nonlinearly weighted methods are investigated in the present paper. We illustrate our approaches with data obtained from the International Software Benchmarking Standards Group (ISBSG) repository and the IBM DP services database. The experimental results show that applying GA to determine suitable weighted similarity measures of software effort drivers in analogy-based software effort estimation models is a feasible approach to improving the accuracy of software effort estimates. It also demonstrates that the nonlinearly weighted analogy method presents better estimate accuracy over the results obtained using the other methods. 相似文献

11.

A probabilistic model for predicting software development effort 总被引：2，自引：0，他引：2

Pendharkar P.C. Subramanian G.H. Rodger J.A. 《IEEE transactions on pattern analysis and machine intelligence》2005,31(7):615-624

Recently, Bayesian probabilistic models have been used for predicting software development effort. One of the reasons for the interest in the use of Bayesian probabilistic models, when compared to traditional point forecast estimation models, is that Bayesian models provide tools for risk estimation and allow decision-makers to combine historical data with subjective expert estimates. In this paper, we use a Bayesian network model and illustrate how a belief updating procedure can be used to incorporate decision-making risks. We develop a causal model from the literature and, using a data set of 33 real-world software projects, we illustrate how decision-making risks can be incorporated in the Bayesian networks. We compare the predictive performance of the Bayesian model with popular nonparametric neural-network and regression tree forecasting models and show that the Bayesian model is a competitive model for forecasting software development effort. 相似文献

12.

A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data

《Information and Software Technology》2000,42(14):1009-1016

This research examined the use of the International Software Benchmarking Standards Group (ISBSG) repository for estimating effort for software projects in an organization not involved in ISBSG. The study investigates two questions: (1) What are the differences in accuracy between ordinary least-squares (OLS) regression and Analogy-based estimation? (2) Is there a difference in accuracy between estimates derived from the multi-company ISBSG data and estimates derived from company-specific data? Regarding the first question, we found that OLS regression performed as well as Analogy-based estimation when using company-specific data for model building. Using multi-company data the OLS regression model provided significantly more accurate results than Analogy-based predictions. Addressing the second question, we found in general that models based on the company-specific data resulted in significantly more accurate estimates. 相似文献

13.

On the value of outlier elimination on software effort estimation research

Yeong-Seok Seo Doo-Hwan Bae 《Empirical Software Engineering》2013,18(4):659-698

Producing accurate and reliable software effort estimation has always been a challenge for both academic research and software industries. Regarding this issue, data quality is an important factor that impacts the estimation accuracy of effort estimation methods. To assess the impact of data quality, we investigated the effect of eliminating outliers on the estimation accuracy of commonly used software effort estimation methods. Based on three research questions, we associatively analyzed the influence of outlier elimination on the accuracy of software effort estimation by applying five methods of outlier elimination (Least trimmed squares, Cook’s distance, K-means clustering, Box plot, and Mantel leverage metric) and two methods of effort estimation (Least squares regression and Estimation by analogy with the variation of the parameters). Empirical experiments were performed using industrial data sets (ISBSG Release 9, Bank and Stock data sets that are collected from financial companies, and a Desharnais data set in the PROMISE repository). In addition, the effect of the outlier elimination methods is evaluated by the statistical tests (the Friedman test and the Wilcoxon signed rank test). The experimental results derived from the evaluation criteria showed that there was no substantial difference between the software effort estimation results with and without outlier elimination. However, statistical analysis indicated that outlier elimination leads to a significant improvement in the estimation accuracy on the Stock data set (in case of some combinations of outlier elimination and effort estimation methods). In addition, although outlier elimination did not lead to a significant improvement in the estimation accuracy on the other data sets, our graphical analysis of errors showed that outlier elimination can improve the likelihood to produce more accurate effort estimates for new software project data to be estimated. Therefore, from a practical point of view, it is necessary to consider the outlier elimination and to conduct a detailed analysis of the effort estimation results to improve the accuracy of software effort estimation in software organizations. 相似文献

14.

Factors affecting duration and effort estimation errors in software development projects

《Information and Software Technology》2007,49(8):827-837

The purpose of this research was to fill a gap in the literature pertaining to the influence of project uncertainty and managerial factors on duration and effort estimation errors. Four dimensions were considered: project uncertainty, use of estimation development processes, use of estimation management processes, and the estimator’s experience. Correlation analysis and linear regression models were used to test the model and the hypotheses on the relations between the four dimensions and estimation errors, using a sample of 43 internal software development projects executed during the year 2002 in the IT division of a large government organization in Israel. Our findings indicate that, in general, a high level of uncertainty is associated with higher effort estimation errors while increased use of estimation development processes and estimation management processes, as well as greater estimator experience, are correlated with lower duration estimation errors. From a practical perspective, the specific findings of this study can be used as guidelines for better duration and effort estimation. Accounting for project uncertainty while managing expectations regarding estimate accuracy; investing more in detailed planning and selecting estimators based on the number of projects they have managed rather than their cumulative experience in project management, may reduce estimation errors. 相似文献

15.

A hybrid heuristic approach to optimize rule-based software quality estimation models

D. Azar H. Harmanani R. Korkmaz 《Information and Software Technology》2009,51(9):1365-1376

Software quality is defined as the degree to which a software component or system meets specified requirements and specifications. Assessing software quality in the early stages of design and development is crucial as it helps reduce effort, time and money. However, the task is difficult since most software quality characteristics (such as maintainability, reliability and reusability) cannot be directly and objectively measured before the software product is deployed and used for a certain period of time. Nonetheless, these software quality characteristics can be predicted from other measurable software quality attributes such as complexity and inheritance. Many metrics have been proposed for this purpose. In this context, we speak of estimating software quality characteristics from measurable attributes. For this purpose, software quality estimation models have been widely used. These take different forms: statistical models, rule-based models and decision trees. However, data used to build such models is scarce in the domain of software quality. As a result, the accuracy of the built estimation models deteriorates when they are used to predict the quality of new software components. In this paper, we propose a search-based software engineering approach to improve the prediction accuracy of software quality estimation models by adapting them to new unseen software products. The method has been implemented and favorable result comparisons are reported in this work. 相似文献

16.

Re-estimating software effort using prior phase efforts and data mining techniques

Pichai Jodpimai Peraphon Sophatsathit Chidchanok Lursinsap 《Innovations in Systems and Software Engineering》2018,14(3):209-228

Software effort estimation has played an important role in software project management. An accurate estimation helps reduce cost overrun and the eventual project failure. Unfortunately, many existing estimation techniques rely on the total project effort which is often determined from the project life cycle. As the project moves on, the course of action deviates from what originally has planned, despite close monitoring and control. This leads to re-estimating software effort so as to improve project operating costs and budgeting. Recent research endeavors attempt to explore phase level estimation that uses known information from prior development phases to predict effort of the next phase by using different learning techniques. This study aims to investigate the influence of preprocessing in prior phases on learning techniques to re-estimate the effort of next phase. The proposed re-estimation approach preprocesses prior phase effort by means of statistical techniques to select a set of input features for learning which in turn are exploited to generate the estimation models. These models are then used to re-estimate next phase effort by using four processing steps, namely data transformation, outlier detection, feature selection, and learning. An empirical study is conducted on 440 estimation models being generated from combinations of techniques on 5 data transformation, 5 outlier detection, 5 feature selection, and 5 learning techniques. The experimental results show that suitable preprocessing is significantly useful for building proper learning techniques to boosting re-estimation accuracy. However, there is no one learning technique that can outperform other techniques over all phases. The proposed re-estimation approach yields more accurate estimation than proportion-based estimation approach. It is envisioned that the proposed re-estimation approach can facilitate researchers and project managers on re-estimating software effort so as to finish the project on time and within the allotted budget. 相似文献

17.

Investigating the use of duration-based moving windows to improve software effort prediction: A replicated study

《Information and Software Technology》2014,56(9):1063-1075

ContextMost research in software effort estimation has not considered chronology when selecting projects for training and testing sets. A chronological split represents the use of a projects starting and completion dates, such that any model that estimates effort for a new project p only uses as training data projects that were completed prior to p’s start. Four recent studies investigated the use of chronological splits, using moving windows wherein only the most recent projects completed prior to a projects starting date were used as training data. The first three studies (S1–S3) found some evidence in favor of using windows; they all defined window sizes as being fixed numbers of recent projects. In practice, we suggest that estimators think in terms of elapsed time rather than the size of the data set, when deciding which projects to include in a training set. In the fourth study (S4) we showed that the use of windows based on duration can also improve estimation accuracy.ObjectiveThis papers contribution is to extend S4 using an additional dataset, and to also investigate the effect on accuracy when using moving windows of various durations.MethodStepwise multivariate regression was used to build prediction models, using all available training data, and also using windows of various durations to select training data. Accuracy was compared based on absolute residuals and MREs; the Wilcoxon test was used to check statistical significances between results. Accuracy was also compared against estimates derived from windows containing fixed numbers of projects.ResultsNeither fixed size nor fixed duration windows provided superior estimation accuracy in the new data set.ConclusionsContrary to intuition, our results suggest that it is not always beneficial to exclude old data when estimating effort for new projects. When windows are helpful, windows based on duration are effective. 相似文献

18.

Neural network models for software development effort estimation: a comparative study

Ali Bou Nassif Mohammad Azzeh Luiz Fernando Capretz Danny Ho 《Neural computing & applications》2016,27(8):2369-2381

Software development effort estimation (SDEE) is one of the main tasks in software project management. It is crucial for a project manager to efficiently predict the effort or cost of a software project in a bidding process, since overestimation will lead to bidding loss and underestimation will cause the company to lose money. Several SDEE models exist; machine learning models, especially neural network models, are among the most prominent in the field. In this study, four different neural network models—multilayer perceptron, general regression neural network, radial basis function neural network, and cascade correlation neural network—are compared with each other based on: (1) predictive accuracy centred on the mean absolute error criterion, (2) whether such a model tends to overestimate or underestimate, and (3) how each model classifies the importance of its inputs. Industrial datasets from the International Software Benchmarking Standards Group (ISBSG) are used to train and validate the four models. The main ISBSG dataset was filtered and then divided into five datasets based on the productivity value of each project. Results show that the four models tend to overestimate in 80 % of the datasets, and the significance of the model inputs varies based on the selected model. Furthermore, the cascade correlation neural network outperforms the other three models in the majority of the datasets constructed on the mean absolute residual criterion. 相似文献

19.

Neural network based models for software effort estimation: a review

Vachik S. Dave Kamlesh Dutta 《Artificial Intelligence Review》2014,42(2):295-307

Prediction of software development effort is the key task for the effective management of any software industry. The accuracy and reliability of prediction mechanisms is also important. Neural network based models are competitive to traditional regression and statistical models for software effort estimation. This comprehensive article, covers various neural network based models for software estimation as presented by various researchers. The review of twenty-one articles covers a range of features used for effort prediction. This survey aims to support the research for effort prediction and to emphasize capabilities of neural network based model in effort prediction. 相似文献

20.

Predictive accuracy comparison between neural networks and statistical regression for development effort of software projects

《Applied Soft Computing》2015

To get a better prediction of costs, schedule, and the risks of a software project, it is necessary to have a more accurate prediction of its development effort. Among the main prediction techniques are those based on mathematical models, such as statistical regressions or machine learning (ML). The ML models applied to predicting the development effort have mainly based their conclusions on the following weaknesses: (1) using an accuracy criterion which leads to asymmetry, (2) applying a validation method that causes a conclusion instability by randomly selecting the samples for training and testing the models, (3) omitting the explanation of how the parameters for the neural networks were determined, (4) generating conclusions from models that were not trained and tested from mutually exclusive data sets, (5) omitting an analysis of the dependence, variance and normality of data for selecting the suitable statistical test for comparing the accuracies among models, and (6) reporting results without showing a statistically significant difference. In this study, these six issues are addressed when comparing the prediction accuracy of a radial Basis Function Neural Network (RBFNN) with that of a regression statistical (the model most frequently compared with ML models), to feedforward multilayer perceptron (MLP, the most commonly used in the effort prediction of software projects), and to general regression neural network (GRNN, a RBFNN variant). The hypothesis tested is the following: the accuracy of effort prediction for RBFNN is statistically better than the accuracy obtained from a simple linear regression (SLR), MLP and GRNN when adjusted function points data, obtained from software projects, is used as the independent variable. Samples obtained from the International Software Benchmarking Standards Group (ISBSG) Release 11 related to new and enhanced projects were used. The models were trained and tested from a leave-one-out cross-validation method. The criteria for evaluating the models were based on Absolute Residuals and by a Friedman statistical test. The results showed that there was a statistically significant difference in the accuracy among the four models for new projects, but not for enhanced projects. Regarding new projects, the accuracy for RBFNN was better than for a SLR at the 99% confidence level, whereas the MLP and GRNN were better than for a SLR at the 90% confidence level. 相似文献