期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Predictive accuracy comparison between neural networks and statistical regression for development effort of software projects

《Applied Soft Computing》2015

To get a better prediction of costs, schedule, and the risks of a software project, it is necessary to have a more accurate prediction of its development effort. Among the main prediction techniques are those based on mathematical models, such as statistical regressions or machine learning (ML). The ML models applied to predicting the development effort have mainly based their conclusions on the following weaknesses: (1) using an accuracy criterion which leads to asymmetry, (2) applying a validation method that causes a conclusion instability by randomly selecting the samples for training and testing the models, (3) omitting the explanation of how the parameters for the neural networks were determined, (4) generating conclusions from models that were not trained and tested from mutually exclusive data sets, (5) omitting an analysis of the dependence, variance and normality of data for selecting the suitable statistical test for comparing the accuracies among models, and (6) reporting results without showing a statistically significant difference. In this study, these six issues are addressed when comparing the prediction accuracy of a radial Basis Function Neural Network (RBFNN) with that of a regression statistical (the model most frequently compared with ML models), to feedforward multilayer perceptron (MLP, the most commonly used in the effort prediction of software projects), and to general regression neural network (GRNN, a RBFNN variant). The hypothesis tested is the following: the accuracy of effort prediction for RBFNN is statistically better than the accuracy obtained from a simple linear regression (SLR), MLP and GRNN when adjusted function points data, obtained from software projects, is used as the independent variable. Samples obtained from the International Software Benchmarking Standards Group (ISBSG) Release 11 related to new and enhanced projects were used. The models were trained and tested from a leave-one-out cross-validation method. The criteria for evaluating the models were based on Absolute Residuals and by a Friedman statistical test. The results showed that there was a statistically significant difference in the accuracy among the four models for new projects, but not for enhanced projects. Regarding new projects, the accuracy for RBFNN was better than for a SLR at the 99% confidence level, whereas the MLP and GRNN were better than for a SLR at the 90% confidence level. 相似文献

2.

A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data

《Information and Software Technology》2000,42(14):1009-1016

This research examined the use of the International Software Benchmarking Standards Group (ISBSG) repository for estimating effort for software projects in an organization not involved in ISBSG. The study investigates two questions: (1) What are the differences in accuracy between ordinary least-squares (OLS) regression and Analogy-based estimation? (2) Is there a difference in accuracy between estimates derived from the multi-company ISBSG data and estimates derived from company-specific data? Regarding the first question, we found that OLS regression performed as well as Analogy-based estimation when using company-specific data for model building. Using multi-company data the OLS regression model provided significantly more accurate results than Analogy-based predictions. Addressing the second question, we found in general that models based on the company-specific data resulted in significantly more accurate estimates. 相似文献

3.

Developing Project Duration Models in Software Engineering 总被引：1，自引：0，他引：1

下载免费PDF全文

Pierre Bourque Serge Oligny Alain Abran and Bertrand Fournier 《计算机科学技术学报》2007,22(3):348-357

Based on the empirical analysis of data contained in the International Software Benchmarking Standards Group （ISBSG） repository, this paper presents software engineering project duration models based on project effort. Duration models are built for the entire dataset and for subsets of projects developed for personal computer, mid-range and mainframe platforms. Duration models are also constructed for projects requiring fewer than 400 person-hours of effort and for projects requiring more than 400 person-hours of effort. The usefulness of adding the maximum number of assigned resources as a second independent variable to explain duration is also analyzed. The opportunity to build duration models directly from project functional size in function points is investigated as well. 相似文献

4.

Neural network models for software development effort estimation: a comparative study

Ali Bou Nassif Mohammad Azzeh Luiz Fernando Capretz Danny Ho 《Neural computing & applications》2016,27(8):2369-2381

Software development effort estimation (SDEE) is one of the main tasks in software project management. It is crucial for a project manager to efficiently predict the effort or cost of a software project in a bidding process, since overestimation will lead to bidding loss and underestimation will cause the company to lose money. Several SDEE models exist; machine learning models, especially neural network models, are among the most prominent in the field. In this study, four different neural network models—multilayer perceptron, general regression neural network, radial basis function neural network, and cascade correlation neural network—are compared with each other based on: (1) predictive accuracy centred on the mean absolute error criterion, (2) whether such a model tends to overestimate or underestimate, and (3) how each model classifies the importance of its inputs. Industrial datasets from the International Software Benchmarking Standards Group (ISBSG) are used to train and validate the four models. The main ISBSG dataset was filtered and then divided into five datasets based on the productivity value of each project. Results show that the four models tend to overestimate in 80 % of the datasets, and the significance of the model inputs varies based on the selected model. Furthermore, the cascade correlation neural network outperforms the other three models in the majority of the datasets constructed on the mean absolute residual criterion. 相似文献

5.

Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review

《Information and Software Technology》2014,56(6):527-544

ContextThe International Software Benchmarking Standards Group (ISBSG) maintains a software development repository with over 6000 software projects. This dataset makes it possible to estimate a project’s size, effort, duration, and cost.ObjectiveThe aim of this study was to determine how and to what extent, ISBSG has been used by researchers from 2000, when the first papers were published, until June of 2012.MethodA systematic mapping review was used as the research method, which was applied to over 129 papers obtained after the filtering process.ResultsThe papers were published in 19 journals and 40 conferences. Thirty-five percent of the papers published between years 2000 and 2011 have received at least one citation in journals and only five papers have received six or more citations. Effort variable is the focus of 70.5% of the papers, 22.5% center their research in a variable different from effort and 7% do not consider any target variable. Additionally, in as many as 70.5% of papers, effort estimation is the research topic, followed by dataset properties (36.4%). The more frequent methods are Regression (61.2%), Machine Learning (35.7%), and Estimation by Analogy (22.5%). ISBSG is used as the only support in 55% of the papers while the remaining papers use complementary datasets. The ISBSG release 10 is used most frequently with 32 references. Finally, some benefits and drawbacks of the usage of ISBSG have been highlighted.ConclusionThis work presents a snapshot of the existing usage of ISBSG in software development research. ISBSG offers a wealth of information regarding practices from a wide range of organizations, applications, and development types, which constitutes its main potential. However, a data preparation process is required before any analysis. Lastly, the potential of ISBSG to develop new research is also outlined. 相似文献

6.

Using Bayesian regression and EM algorithm with missing handling for software effort prediction

《Information and Software Technology》2015

ContextAlthough independent imputation techniques are comprehensively studied in software effort prediction, there are few studies on embedded methods in dealing with missing data in software effort prediction.ObjectiveWe propose BREM (Bayesian Regression and Expectation Maximization) algorithm for software effort prediction and two embedded strategies to handle missing data.MethodThe MDT (Missing Data Toleration) strategy ignores the missing data when using BREM for software effort prediction and the MDI (Missing Data Imputation) strategy uses observed data to impute missing data in an iterative manner while elaborating the predictive model.ResultsExperiments on the ISBSG and CSBSG datasets demonstrate that when there are no missing values in historical dataset, BREM outperforms LR (Linear Regression), BR (Bayesian Regression), SVR (Support Vector Regression) and M5′ regression tree in software effort prediction on the condition that the test set is not greater than 30% of the whole historical dataset for ISBSG dataset and 25% of the whole historical dataset for CSBSG dataset. When there are missing values in historical datasets, BREM with the MDT and MDI strategies significantly outperforms those independent imputation techniques, including MI, BMI, CMI, MINI and M5′. Moreover, the MDI strategy provides BREM with more accurate imputation for the missing values than those given by the independent missing imputation techniques on the condition that the level of missing data in training set is not larger than 10% for both ISBSG and CSBSG datasets.ConclusionThe experimental results suggest that BREM is promising in software effort prediction. When there are missing values, the MDI strategy is preferred to be embedded with BREM. 相似文献

7.

Software Project Effort Estimation Based on Multiple Parametric Models Generated Through Data Clustering

下载免费PDF全文

Daniel Rodríguez Angel García Crespo 《计算机科学技术学报》2007,22(3):371-378

Parametric software effort estimation models usually consists of only a single mathematical relationship. With the advent of software repositories containing data from heterogeneous projects, these types of models suffer from poor adjustment and predictive accuracy. One possible way to alleviate this problem is the use of a set of mathematical equations obtained through dividing of the historical project datasets according to different parameters into subdatasets called partitions. In turn, partitions are divided into clusters that serve as a tool for more accurate models. In this paper, we describe the process, tool and results of such approach through a case study using a publicly available repository, ISBSG. Results suggest the adequacy of the technique as an extension of existing single-expression models without making the estimation process much more complex that uses a single estimation model. A tool to support the process is also presented. Keywords software engineering, software measurement, effort estimation, clustering 相似文献

8.

An approach to optimizing software development team size

Marjan Heri?ko Aleš ?ivkovi? Ivan Rozman 《Information Processing Letters》2008,108(3):101-106

This paper explores the relationship between software size, development effort and team size. We propose an approach aimed at finding the team size where the project effort has its minimum. The approach was applied to the ISBSG repository containing nearly 4000 software projects. Based on the results we provide our recommendation for the optimal or near-optimal team size in seven project groups defined by four project properties. 相似文献

9.

Estimating software project effort using analogies 总被引：1，自引：0，他引：1

Shepperd M. Schofield C. 《IEEE transactions on pattern analysis and machine intelligence》1997,23(11):736-743

Accurate project effort prediction is an important goal for the software engineering community. To date most work has focused upon building algorithmic models of effort, for example COCOMO. These can be calibrated to local environments. We describe an alternative approach to estimation based upon the use of analogies. The underlying principle is to characterize projects in terms of features (for example, the number of interfaces, the development method or the size of the functional requirements document). Completed projects are stored and then the problem becomes one of finding the most similar projects to the one for which a prediction is required. Similarity is defined as Euclidean distance in n-dimensional space where n is the number of project features. Each dimension is standardized so all dimensions have equal weight. The known effort values of the nearest neighbors to the new project are then used as the basis for the prediction. The process is automated using a PC-based tool known as ANGEL. The method is validated on nine different industrial datasets (a total of 275 projects) and in all cases analogy outperforms algorithmic models based upon stepwise regression. From this work we argue that estimation by analogy is a viable technique that, at the very least, can be used by project managers to complement current estimation techniques 相似文献

10.

A new calibration for Function Point complexity weights

Wei Xia Luiz Fernando Capretz Danny Ho Faheem Ahmed 《Information and Software Technology》2008,50(7-8):670-683

Function Point (FP) is a useful software metric that was first proposed 25 years ago, since then, it has steadily evolved into a functional size metric consolidated in the well-accepted Standardized International Function Point Users Group (IFPUG) Counting Practices Manual – version 4.2. While software development industry has grown rapidly, the weight values assigned to count standard FP still remain same, which raise critical questions about the validity of the weight values. In this paper, we discuss the concepts of calibrating Function Point, whose aims are to estimate a more accurate software size that fits for specific software application, to reflect software industry trend, and to improve the cost estimation of software projects. A FP calibration model called Neuro-Fuzzy Function Point Calibration Model (NFFPCM) that integrates the learning ability from neural network and the ability to capture human knowledge from fuzzy logic is proposed. The empirical validation using International Software Benchmarking Standards Group (ISBSG) data repository release 8 shows a 22% accuracy improvement of mean magnitude relative error (MMRE) in software effort estimation after calibration. 相似文献

11.

Optimization of analogy weights by genetic algorithm for software effort estimation

《Information and Software Technology》2006,48(11):1034-1045

A reliable and accurate estimate of software development effort has always been a challenge for both the software industry and academia. Analogy is a widely adopted problem solving technique that has been evaluated and confirmed in software effort or cost estimation domains. Similarity measures between pairs of effort drivers play a central role in analogy-based estimation models. However, hardly any research has addressed the issue of how to decide on suitable weighted similarity measures for software effort drivers. The present paper investigates the effect on estimation accuracy of the adoption of genetic algorithm (GA) to determine the appropriate weighted similarity measures of effort drivers in analogy-based software effort estimation models. Three weighted analogy methods, namely, the unequally weighted, the linearly weighted and the nonlinearly weighted methods are investigated in the present paper. We illustrate our approaches with data obtained from the International Software Benchmarking Standards Group (ISBSG) repository and the IBM DP services database. The experimental results show that applying GA to determine suitable weighted similarity measures of software effort drivers in analogy-based software effort estimation models is a feasible approach to improving the accuracy of software effort estimates. It also demonstrates that the nonlinearly weighted analogy method presents better estimate accuracy over the results obtained using the other methods. 相似文献

12.

Applying a general regression neural network for predicting development effort of short-scale programs

Cuauhtemoc Lopez-Martin 《Neural computing & applications》2011,20(3):389-401

Software development effort prediction is considered in several international software processes as the Capability Maturity Model-Integrated (CMMi), by ISO-15504 as well as by ISO/IEC 12207. In this paper, data of two kinds of lines of code gathered from programs developed with practices based on the Personal Software Process (PSP) were used as independent variables in three models for estimating and predicting the development effort. Samples of 163 and 80 programs were used for verifying and validating, respectively, the models. The prediction accuracy comparison among a multiple linear regression, a general regression neural network, and a fuzzy logic model was made using as criteria the magnitude of error relative to the estimate (MER) and mean square error (MSE). Results accepted the following hypothesis: effort prediction accuracy of a general regression neural network is statistically equal than those obtained by a fuzzy logic model as well as by a multiple linear regression, when new and change code and reused code obtained from short-scale programs developed with personal practices are used as independent variables. 相似文献

13.

Metaheuristic optimization of multivariate adaptive regression splines for predicting the schedule of software projects

Angel Ferreira-Santiago Cuauhtémoc López-Martín Cornelio Yáñez-Márquez 《Neural computing & applications》2016,27(8):2229-2240

A qualitative common perception of the software industry is that it finishes its projects late and over budget, whereas from a quantitative point of view, only 39 % of software projects are finished on time compared to the schedule when the project started. This low percentage has been attributed to factors such as unrealistic time frames and lack of planning regarding poor prediction. The main techniques used for predicting project schedule have mainly been based on expert judgment and mathematical models. In this study, a new model, derived from the multivariate adaptive regression splines (MARS) model, is proposed. This new model, optimized MARS (OMARS), uses a simulated annealing process to find a transformation of the input data space prior to applying MARS in order to improve accuracy when predicting the schedule of software projects. The prediction accuracy of the OMARS model is compared to that of stand-alone MARS and a multiple linear regression (MLR) model with a logarithmic transformation. The two independent variables used for training and testing the models are functional size, which corresponds to a composite value of 19 independent variables, and the maximum size of the team of developers. The data set of projects was obtained from the International Software Benchmarking Standards Group (ISBSG) Release 11. Results based on the absolute residuals and t paired and Wilcoxon statistical tests showed that prediction accuracy with OMARS is statistically better than that with the MARS and MLR models. 相似文献

14.

Neural network based models for software effort estimation: a review

Vachik S. Dave Kamlesh Dutta 《Artificial Intelligence Review》2014,42(2):295-307

Prediction of software development effort is the key task for the effective management of any software industry. The accuracy and reliability of prediction mechanisms is also important. Neural network based models are competitive to traditional regression and statistical models for software effort estimation. This comprehensive article, covers various neural network based models for software estimation as presented by various researchers. The review of twenty-one articles covers a range of features used for effort prediction. This survey aims to support the research for effort prediction and to emphasize capabilities of neural network based model in effort prediction. 相似文献

15.

A multivariate statistical framework for the analysis of software effort phase distribution

《Information and Software Technology》2015

ContextIn software project management, the distribution of resources to various project activities is one of the most challenging problems since it affects team productivity, product quality and project constraints related to budget and scheduling.ObjectiveThe study aims to (a) reveal the high complexity of modelling the effort usage proportion in different phases as well as the divergence from various rules-of-thumb in related literature, and (b) present a systematic data analysis framework, able to offer better interpretations and visualisation of the effort distributed in specific phases.MethodThe basis for the proposed multivariate statistical framework is Compositional Data Analysis, a methodology appropriate for proportions, along with other methods like the deviation from rules-of-thumb, the cluster analysis and the analysis of variance. The effort allocations to phases, as reported in around 1500 software projects of the ISBSG R11 repository, were transformed to vectors of proportions of the total effort and were analysed with respect to prime project attributes.ResultsThe proposed statistical framework was able to detect high dispersion among data, distribution inequality and various interesting correlations and trends, groupings and outliers, especially with respect to other categorical and continuous project attributes. Only a very small number of projects were found close to the rules-of-thumb from the related literature. Significant differences in the proportion of effort spent in different phrases for different types of projects were found.ConclusionThere is no simple model for the effort allocated to phases of software projects. The data from previous projects can provide valuable information regarding the distribution of the effort for various types of projects, through analysis with multivariate statistical methodologies. The proposed statistical framework is generic and can be easily applied in a similar sense to any dataset containing effort allocation to phases. 相似文献

16.

LMES: A localized multi-estimator model to estimate software development effort

Vahid Khatibi Bardsiri Dayang Norhayati Abang Jawawi Amid Khatibi Bardsiri Elham Khatibi 《Engineering Applications of Artificial Intelligence》2013,26(10):2624-2640

Accurate estimation of software development effort is strongly associated with the success or failure of software projects. The clear lack of convincing accuracy and flexibility in this area has attracted the attention of researchers over the past few years. Despite improvements achieved in effort estimating, there is no strong agreement as to which individual model is the best. Recent studies have found that an accurate estimation of development effort in software projects is unreachable in global space, meaning that proposing a high performance estimation model for use in different types of software projects is likely impossible. In this paper, a localized multi-estimator model, called LMES, is proposed in which software projects are classified based on underlying attributes. Different clusters of projects are then locally investigated so that the most accurate estimators are selected for each cluster. Unlike prior models, LMES does not rely on only one individual estimator in a cluster of projects. Rather, an exhaustive investigation is conducted to find the best combination of estimators to assign to each cluster. The investigation domain includes 10 estimators combined using four combination methods, which results in 4017 different combinations. ISBSG, Maxwell and COCOMO datasets are utilized for evaluation purposes, which include a total of 573 real software projects. The promising results show that the estimate accuracy is improved through localization of estimation process and allocation of appropriate estimators. Besides increased accuracy, the significant contribution of LMES is its adaptability and flexibility to deal with the complexity and uncertainty that exist in the field of software development effort estimation. 相似文献

17.

Improved estimation of software project effort using multiple additive regression trees

Mahmoud O. Elish 《Expert systems with applications》2009,36(7):10774-10778

Accurate estimation of software project effort is crucial for successful management and control of a software project. Recently, multiple additive regression trees (MART) has been proposed as a novel advance in data mining that extends and improves the classification and regression trees (CART) model using stochastic gradient boosting. This paper empirically evaluates the potential of MART as a novel software effort estimation model when compared with recently published models, in terms of accuracy. The comparison is based on a well-known and respected NASA software project dataset. The results indicate that improved estimation accuracy of software project effort has been achieved using MART when compared with linear regression, radial basis function neural networks, and support vector regression models. 相似文献

18.

Segmented software cost estimation models based on fuzzy clustering

Javier Aroba Author Vitae Juan J. Cuadrado-Gallego^{Author Vitae} 《Journal of Systems and Software》2008,81(11):1944-1950

Parametric software cost estimation models are based on mathematical relations, obtained from the study of historical software projects databases, that intend to be useful to estimate the effort and time required to develop a software product. Those databases often integrate data coming from projects of a heterogeneous nature. This entails that it is difficult to obtain a reasonably reliable single parametric model for the range of diverging project sizes and characteristics. A solution proposed elsewhere for that problem was the use of segmented models in which several models combined into a single one contribute to the estimates depending on the concrete characteristic of the inputs. However, a second problem arises with the use of segmented models, since the belonging of concrete projects to segments or clusters is subject to a degree of fuzziness, i.e. a given project can be considered to belong to several segments with different degrees.This paper reports the first exploration of a possible solution for both problems together, using a segmented model based on fuzzy clusters of the project space. The use of fuzzy clustering allows obtaining different mathematical models for each cluster and also allows the items of a project database to contribute to more than one cluster, while preserving constant time execution of the estimation process. The results of an evaluation of a concrete model using the ISBSG 8 project database are reported, yielding better figures of adjustment than its crisp counterpart. 相似文献

19.

A Comparative Study of Cost Estimation Models for Web Hypermedia Applications

Emilia Mendes Ian Watson Chris Triggs Nile Mosley Steve Counsell 《Empirical Software Engineering》2003,8(2):163-196

Software cost models and effort estimates help project managers allocate resources, control costs and schedule and improve current practices, leading to projects finished on time and within budget. In the context of Web development, these issues are also crucial, and very challenging given that Web projects have short schedules and very fluidic scope. In the context of Web engineering, few studies have compared the accuracy of different types of cost estimation techniques with emphasis placed on linear and stepwise regressions, and case-based reasoning (CBR). To date only one type of CBR technique has been employed in Web engineering. We believe results obtained from that study may have been biased, given that other CBR techniques can also be used for effort prediction. Consequently, the first objective of this study is to compare the prediction accuracy of three CBR techniques to estimate the effort to develop Web hypermedia applications and to choose the one with the best estimates. The second objective is to compare the prediction accuracy of the best CBR technique against two commonly used prediction models, namely stepwise regression and regression trees. One dataset was used in the estimation process and the results showed that the best predictions were obtained for stepwise regression. 相似文献

20.

Functional networks as a novel data mining paradigm in forecasting software development efforts

Emad A. El-Sebakhy 《Expert systems with applications》2011,38(3):2187-2194

This paper proposes a new intelligence paradigm scheme to forecast that emphasizes on numerous software development elements based on functional networks forecasting framework. The most common methods for estimating software development efforts that have been proposed in literature are: line of code (LOC)-based constructive cost model (COCOMO), function point (FP) based on neural networks, regression, and case-based reasoning (CBR). Unfortunately, such forecasting models have numerous of drawbacks, namely, their inability to deal with uncertainties and imprecision present in software projects early in the development life-cycle. The main benefit of this study is to utilize both function points and development environments of recent software development cases prominent, which have high impact on the success of software development projects. Both implementation and learning process are briefly proposed. We investigate the efficiency of the new framework for predicting the software development efforts using both simulation and COCOMO real-life databases. Prediction accuracy of the functional networks framework is evaluated and compared with the commonly used regression and neural networks-based models. The results show that the new intelligence paradigm predicts the required efforts of the initial stage of software development with reliable performance and outperforms both regression and neural networks-based models. 相似文献