期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An empirical study using task assignment patterns to improve theaccuracy of software effort estimation

Smith R.K. Hale J.E. Parrish A.S. 《IEEE transactions on pattern analysis and machine intelligence》2001,27(3):264-271

In most software development organizations, there is seldom a one-to-one mapping between software developers and development tasks. It is frequently necessary to concurrently assign individuals to multiple tasks and to assign more than one individual to work cooperatively on a single task. A principal goal in making such assignments should be to minimize the effort required to complete each task. But what impact does the manner in which developers are assigned to tasks have on the effort requirements? This paper identifies four task assignment factors: team size, concurrency, intensity, and fragmentation. These four factors are shown to improve the predictive ability of the well-known intermediate COCOMO cost estimation model. A parsimonious effort estimation model is also derived that utilizes a subset of the task assignment factors and unadjusted function points. For the data examined, this parsimonious model is shown to have goodness of fit and quality of estimation superior to that of the COCOMO model, while utilizing fewer cost factors 相似文献

2.

Software productivity and effort prediction with ordinal regression

《Information and Software Technology》2005,47(1):17-29

In the area of software cost estimation, various methods have been proposed to predict the effort or the productivity of a software project. Although most of the proposed methods produce point estimates, in practice it is more realistic and useful for a method to provide interval predictions. In this paper, we explore the possibility of using such a method, known as ordinal regression to model the probability of correctly classifying a new project to a cost category. The proposed method is applied to three data sets and is validated with respect to its fitting and predictive accuracy. 相似文献

3.

On the value of outlier elimination on software effort estimation research

Yeong-Seok Seo Doo-Hwan Bae 《Empirical Software Engineering》2013,18(4):659-698

Producing accurate and reliable software effort estimation has always been a challenge for both academic research and software industries. Regarding this issue, data quality is an important factor that impacts the estimation accuracy of effort estimation methods. To assess the impact of data quality, we investigated the effect of eliminating outliers on the estimation accuracy of commonly used software effort estimation methods. Based on three research questions, we associatively analyzed the influence of outlier elimination on the accuracy of software effort estimation by applying five methods of outlier elimination (Least trimmed squares, Cook’s distance, K-means clustering, Box plot, and Mantel leverage metric) and two methods of effort estimation (Least squares regression and Estimation by analogy with the variation of the parameters). Empirical experiments were performed using industrial data sets (ISBSG Release 9, Bank and Stock data sets that are collected from financial companies, and a Desharnais data set in the PROMISE repository). In addition, the effect of the outlier elimination methods is evaluated by the statistical tests (the Friedman test and the Wilcoxon signed rank test). The experimental results derived from the evaluation criteria showed that there was no substantial difference between the software effort estimation results with and without outlier elimination. However, statistical analysis indicated that outlier elimination leads to a significant improvement in the estimation accuracy on the Stock data set (in case of some combinations of outlier elimination and effort estimation methods). In addition, although outlier elimination did not lead to a significant improvement in the estimation accuracy on the other data sets, our graphical analysis of errors showed that outlier elimination can improve the likelihood to produce more accurate effort estimates for new software project data to be estimated. Therefore, from a practical point of view, it is necessary to consider the outlier elimination and to conduct a detailed analysis of the effort estimation results to improve the accuracy of software effort estimation in software organizations. 相似文献

4.

Classification techniques for metric-based software development 总被引：1，自引：0，他引：1

Christof Ebert 《Software Quality Journal》1996,5(4):255-272

Managing software development and maintenance projects requires predictions about components of the software system that are likely to have a high error rate or that need high development effort. The value of any classification is determined by the accuracy and cost of such predictions. The paper investigates the hypothesis whether fuzzy classification applied to criticality prediction provides better results than other classification techniques that have been introduced in this area. Five techniques for identifying error-prone software components are compared, namely Pareto classification, crisp classification trees, factor-based discriminant analysis, neural networks, and fuzzy classification. The comparison is illustrated with experimental results from the development of industrial real-time projects. A module quality model — with respect to changes — provides both quality of fit (according to past data) and predictive accuracy (according to ongoing projects). Fuzzy classification showed best results in terms of overall predictive accuracy. 相似文献

5.

Investigating the use of moving windows to improve software effort prediction: a replicated study

Chris Lokan Emilia Mendes 《Empirical Software Engineering》2017,22(2):716-767

To date most research in software effort estimation has not taken chronology into account when selecting projects for training and validation sets. A chronological split represents the use of a project’s starting and completion dates, such that any model that estimates effort for a new project p only uses as its training set projects that have been completed prior to p’s starting date. A study in 2009 (“S3”) investigated the use of chronological split taking into account a project’s age. The research question investigated was whether the use of a training set containing only the most recent past projects (a “moving window” of recent projects) would lead to more accurate estimates when compared to using the entire history of past projects completed prior to the starting date of a new project. S3 found that moving windows could improve the accuracy of estimates. The study described herein replicates S3 using three different and independent data sets. Estimation models were built using regression, and accuracy was measured using absolute residuals. The results contradict S3, as they do not show any gain in estimation accuracy when using windows for effort estimation. This is a surprising result: the intuition that recent data should be more helpful than old data for effort estimation is not supported. Several factors, which are discussed in this paper, might have contributed to such contradicting results. Some of our future work entails replicating this work using other datasets, to understand better when using windows is a suitable choice for software companies. 相似文献

6.

A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain

Ayşe Bakır Burak Turhan Ayşe B. Bener 《Software Quality Journal》2010,18(1):57-80

Cost estimation and effort allocation are the key challenges for successful project planning and management in software development. Therefore, both industry and the research community have been working on various models and techniques to accurately predict the cost of projects. Recently, researchers have started debating whether the prediction performance depends on the structure of data rather than the models used. In this article, we focus on a new aspect of data homogeneity, “cross- versus within-application domain”, and investigate what kind of training data should be used for software cost estimation in the embedded systems domain. In addition, we try to find out the effect of training dataset size on the prediction performance. Based on our empirical results, we conclude that it is better to use cross-domain data for embedded software cost estimation and the optimum training data size depends on the method used. 相似文献

7.

Lyapunov-based stability analysis automated by genetic programming

Benyamin Grosman Author Vitae Author Vitae 《Automatica》2009,45(1):252-256

This contribution describes an automatic technique to detect suitable Lyapunov functions for nonlinear systems. The theoretical basis for the work is Lyapunov’s Direct Method, which provides sufficient conditions for stability of equilibrium points. In our proposed approach, genetic programming (GP) is used to search for suitable Lyapunov functions, that is, those that best predict the true domain of attraction. In the work presented here, our GP approach has been extended by defining a target function accounting for the Lyapunov function level sets. 相似文献

8.

Software effort models for early estimation of process controlapplications

Mukhopadhyay T. Kekre S. 《IEEE transactions on pattern analysis and machine intelligence》1992,18(10):915-924

Models are developed to estimate lines of code and function counts directly from user application features of process control systems early in the software development lifecycle. Since the application features are known with reasonable degree of confidence during early stages of development, it is possible to extend the use of the constructive cost model (COCOMO) and function-points-based approach for early software cost estimation. Alternative feature-based models that estimate size and effort using application features and productivity factors are developed. The feature-based models have been shown to estimate software effort with the least error 相似文献

9.

Inclusion of Functional and Non-Functional Parameters for the Prediction of Overall Efforts of Mobile Applications.

《Computer Standards & Interfaces》2020

Size is a major and main parameter for the estimation of efforts and cost of software applications in general and mobile applications in particular and estimating effort, cost and time has been a key step in the life cycle of the software project. In order to create a sound schedule for the project, it is therefore important to have these estimates as soon as possible in the software development life cycle. In past years, many methods have been employed to estimate size and efforts of mobile applications but till now these methods do not meet the expected needs from customer. In this paper, we present a new size measurement method i.e., Mobile COSMIC Function Points (MCFP) based on the COSMIC approach, which is a primary factor for estimation of efforts in mobile application development. This paper analyzes the possibility of using a combination of Functional and Non-functional parameters including both Mobile Technical Complexity Factors (MTCF) and Mobile Environmental Complexity Factors (MECF) for the purpose of mobile application sizing prediction and hence effort estimation. For the purpose of this study, thirty six mobile applications were analyzed and their size and efforts were compared by applying the new effort estimation approach. In this context of a mobile application, few investigations have been performed to compare the effectiveness of COSMIC, FP's and the proposed approach “COSMIC Plus Effort Estimation Model (CPEEM)”. The main goal of this paper is to investigate if the inclusion of Non functional parameters imposes an effect on the functional size of mobile application development. Upon estimating efforts using the proposed approach, the results were promising for mobile applications when compared the results of our approach with the results of the other two approaches 相似文献

10.

A comparison of function point counting techniques

Jeffery D.R. Low G.C. Barnes M. 《IEEE transactions on pattern analysis and machine intelligence》1993,19(5):529-532

Effective management of the software development process requires that management be able to estimate total development effort and cost. One of the fundamental problems associated with effort and cost estimation is the a priori estimation of software size. Function point analysis has emerged over the last decade as a popular tool for this task. Criticisms of the method that relate to the way in which function counts are calculated and the impact of the processing complexity adjustment on the function point count have arisen. SPQR/20 function points among others are claimed to overcome some of these criticisms. The SPQR/20 function point method is compared to traditional function point analysis as a measure of software size in an empirical study of MIS environments. In a study of 64 projects in one organization it was found that both methods would appear equally satisfactory. However consistent use of one method should occur since the individual counts differ considerably 相似文献

11.

An empirical study of the effect of complexity, platform, and program type on software development effort of business applications

Girish H. Subramanian Parag C. Pendharkar Mary Wallace 《Empirical Software Engineering》2006,11(4):541-553

Several popular cost estimation models like COCOMO and function points use adjustment variables, such as software complexity and platform, to modify original estimates and arrive at final estimates. Using data on 666 programs from 15 software projects, this study empirically tests a research model that studies the influence of three adjustment variables—software complexity, computer platform, and program type (batch or online programs) on software effort. The results confirm that all the three adjustment variables have a significant effect on effort. Further, multiple comparison of means also points to two other results for the data examined. Batch programs involve significantly higher software effort than online programs. Programs rated as complex have significantly higher effort than programs rated as average. 相似文献

12.

Bayesian statistical effort prediction models for data-centred 4GL software development

《Information and Software Technology》2006,48(11):1056-1067

Constructing an accurate effort prediction model is a challenge in Software Engineering. This paper presents three Bayesian statistical software effort prediction models for database-oriented software systems, which are developed using a specific 4GL toolsuite. The models consist of specification-based software size metrics and development team's productivity metric. The models are constructed based on the subjective knowledge of human expert and calibrated using empirical data collected from 17 software systems developed in the target environment. The models' predictive accuracy is evaluated using subsets of the same data, which were not used for the models' calibration. The results show that the models have achieved very good predictive accuracy in terms of MMRE and pred measures. Hence, it is confirmed that the Bayesian statistical models can predict effort successfully in the target environment. In comparison with commonly used multiple linear regression models, the Bayesian statistical models'predictive accuracy is equivalent in general. However, when the number of software systems used for the models' calibration becomes smaller than five, the predictive accuracy of the best Bayesian statistical models are significantly better than the multiple linear regression model. This result suggests that the Bayesian statistical models would be a better choice when software organizations/practitioners do not posses sufficient empirical data for the models' calibration. The authors expect these findings to encourage more researchers to investigate the use of Bayesian statistical models for predicting software effort. 相似文献

13.

Dynamic population variation in genetic programming

Peyman Kouchakpour Anthony Zaknich Thomas Bräunl 《Information Sciences》2009,179(8):1078-2339

Three innovations are proposed for dynamically varying the population size during the run of the genetic programming (GP) system. These are related to what is called Dynamic Population Variation (DPV), where the size of the population is dynamically varied using a heuristic feedback mechanism during the execution of the GP with the aim of reducing the computational effort compared with Standard Genetic Programming (SGP). Firstly, previously developed population variation pivot functions are controlled by four newly proposed characteristic measures. Secondly, a new gradient based pivot function is added to this dynamic population variation method in conjunction with the four proposed measures. Thirdly, a formula for population variations that is independent of special constants is introduced and evaluated. The efficacy of these innovations is examined using a comprehensive range of standard representative problems. It is shown that the new ideas do have the capacity to provide solutions at a lower computational cost compared with standard genetic programming and previously reported algorithms such as the plague operator and the static population variation schemes previously introduced by the authors. 相似文献

14.

Modeling development effort in object-oriented systems using designproperties

Briand L.C. Wust J. 《IEEE transactions on pattern analysis and machine intelligence》2001,27(11):963-986

In the context of software cost estimation, system size is widely taken as a main driver of system development effort. However, other structural design properties, such as coupling, cohesion, and complexity, have been suggested as additional cost factors. Using effort data from an object-oriented development project, we empirically investigate the relationship between class size and the development effort for a class and what additional impact structural properties such as class coupling have on effort. The paper proposes a practical, repeatable, and accurate analysis procedure to investigate relationships between structural properties and development effort. Results indicate that fairly accurate predictions of class effort can be made based on simple measures of the class interface size alone (mean MREs below 30 percent). Effort predictions at the system level are even more accurate as, using Bootstrapping, the estimated 95 percent confidence interval for MREs is 3 to 23 percent. But, more sophisticated coupling and cohesion measures do not help to improve these predictions to a degree that would be practically significant. However, the use of hybrid models combining Poisson regression and CART regression trees clearly improves the accuracy of the models as compared to using Poisson regression alone 相似文献

15.

An empirical study of the Cobb–Douglas production function properties of software development effort

《Information and Software Technology》2008,50(12):1181-1188

In this paper we study whether software development effort exhibits Cobb–Douglas functional form with respect to team size and software size. We empirically test this relationship using real-world software engineering data set containing over 500 software projects. The results of our experiments indicate that the hypothesized Cobb–Douglas function form for software development effort with respect to team size and software size is true. We also find increasing returns to scale relationship between software size and team size with software development effort. 相似文献

16.

Generating Synthetic Data to Match Data Mining Patterns

Eno J. Thompson C.W. 《Internet Computing, IEEE》2008,12(3):78-82

相似文献

17.

Trading between quality and non-functional properties of median filter in embedded systems

Zdenek Vasicek Vojtech Mrazek 《Genetic Programming and Evolvable Machines》2017,18(1):45-82

Genetic improvement has been used to improve functional and non-functional properties of software. In this paper, we propose a new approach that applies a genetic programming (GP)-based genetic improvement to trade between functional and non-functional properties of existing software. The paper investigates possibilities and opportunities for improving non-functional parameters such as execution time, code size, or power consumption of median functions implemented using comparator networks. In general, it is impossible to improve non-functional parameters of the median function without accepting occasional errors in results because optimal implementations are available. In order to address this issue, we proposed a method providing suitable compromises between accuracy, execution time and power consumption. Traditionally, a randomly generated set of test vectors is employed so as to assess the quality of GP individuals. We demonstrated that such an approach may produce biased solutions if the test vectors are generated inappropriately. In order to measure the accuracy of determining a median value and avoid such a bias, we propose and formally analyze new quality metrics which are based on the positional error calculated using the permutation principle introduced in this paper. It is shown that the proposed method enables the discovery of solutions which show a significant improvement in execution time, power consumption, or size with respect to the accurate median function while keeping errors at a moderate level. Non-functional properties of the discovered solutions are estimated using data sets and validated by physical measurements on physical microcontrollers. The benefits of the evolved implementations are demonstrated on two real-world problems—sensor data processing and image processing. It is concluded that data processing software modules offer a great opportunity for genetic improvement. The results revealed that it is not even necessary to determine the median value exactly in many cases which helps to reduce power consumption or increase performance. The discovered implementations of accurate, as well as approximate median functions, are available as C functions for download and can be employed in a custom application (http://www.fit.vutbr.cz/research/groups/ehw/median). 相似文献

18.

A validation of the component-based method for software sizeestimation

Dolado J.J. 《IEEE transactions on pattern analysis and machine intelligence》2000,26(10):1006-1021

Estimation of software size is a crucial activity among the tasks of software management. Work planning and subsequent estimations of the effort required are made based on the estimate of the size of the software product. Software size can be measured in several ways: lines of code (LOC) is a common measure and is usually one of the independent variables in equations for estimating several methods for estimating the final LOC count of a software system in the early stages. We report the results of the validation of the component-based method (initially proposed by Verner and Tate, 1988) for software sizing. This was done through the analysis of 46 projects involving more than 100,000 LOC of a fourth-generation language. We present several conclusions concerning the predictive capabilities of the method. We observed that the component-based method behaves reasonably, although not as well as expected for “global” methods such as Mark II function points for software size prediction. The main factor observed that affects the performance is the type of component 相似文献

19.

Developing new fitness functions in genetic programming for classification with unbalanced data

Bhowan U Johnston M Zhang M 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2012,42(2):406-421

Machine learning algorithms such as genetic programming (GP) can evolve biased classifiers when data sets are unbalanced. Data sets are unbalanced when at least one class is represented by only a small number of training examples (called the minority class) while other classes make up the majority. In this scenario, classifiers can have good accuracy on the majority class but very poor accuracy on the minority class(es) due to the influence that the larger majority class has on traditional training criteria in the fitness function. This paper aims to both highlight the limitations of the current GP approaches in this area and develop several new fitness functions for binary classification with unbalanced data. Using a range of real-world classification problems with class imbalance, we empirically show that these new fitness functions evolve classifiers with good performance on both the minority and majority classes. Our approaches use the original unbalanced training data in the GP learning process, without the need to artificially balance the training examples from the two classes (e.g., via sampling). 相似文献

20.

Comparative analysis of statistical and machine learning methods for predicting faulty modules

《Applied Soft Computing》2014

The demand for development of good quality software has seen rapid growth in the last few years. This is leading to increase in the use of the machine learning methods for analyzing and assessing public domain data sets. These methods can be used in developing models for estimating software quality attributes such as fault proneness, maintenance effort, testing effort. Software fault prediction in the early phases of software development can help and guide software practitioners to focus the available testing resources on the weaker areas during the software development. This paper analyses and compares the statistical and six machine learning methods for fault prediction. These methods (Decision Tree, Artificial Neural Network, Cascade Correlation Network, Support Vector Machine, Group Method of Data Handling Method, and Gene Expression Programming) are empirically validated to find the relationship between the static code metrics and the fault proneness of a module. In order to assess and compare the models predicted using the regression and the machine learning methods we used two publicly available data sets AR1 and AR6. We compared the predictive capability of the models using the Area Under the Curve (measured from the Receiver Operating Characteristic (ROC) analysis). The study confirms the predictive capability of the machine learning methods for software fault prediction. The results show that the Area Under the Curve of model predicted using the Decision Tree method is 0.8 and 0.9 (for AR1 and AR6 data sets, respectively) and is a better model than the model predicted using the logistic regression and other machine learning methods. 相似文献