期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Application of neural networks for predicting program faults 总被引：1，自引：0，他引：1

Taghi M. Khoshgoftaar Abhijit S. Pandya David L. Lanning 《Annals of Software Engineering》1995,1(1):141-154

Accurately predicting the number of faults in program modules is a major problem in the quality control of large software development efforts. Some software complexity metrics are closely related to the distribution of faults across program modules. Using these relationships, software engineers develop models that provide early estimates of quality metrics that do not become available until late in the development cycle. By considering these early estimates, software engineers can take actions to avoid or prepare for emerging quality problems. Most often, the predictive models are based upon multiple regression analysis. However, measures of software quality and complexity exhibit systematic departures from the assumptions of these analyses. With extreme violations of these assumptions, multiple regression models become unstable and lose most of their predictive quality. Since neural network models carry no data assumptions, these models could be more appropriate than regression models for modeling software faults. In this paper, we explore a neural network methodology for developing models that predict the number of faults in program modules. We apply this methodology to develop neural network models based upon data collected during the development of two commercial software systems. After developing neural network models, we apply multiple linear regression methods to develop regression models on the same data. For the data sets considered, the neural network methodology produced better predictive models in terms of both quality of fit and predictive quality. 相似文献

2.

Source code size estimation approaches for object-oriented systems from UML class diagrams: A comparative study

《Information and Software Technology》2014,56(2):220-237

BackgroundSource code size in terms of SLOC (source lines of code) is the input of many parametric software effort estimation models. However, it is unavailable at the early phase of software development.ObjectiveWe investigate the accuracy of early SLOC estimation approaches for an object-oriented system using the information collected from its UML class diagram available at the early software development phase.MethodWe use different modeling techniques to build the prediction models for investigating the accuracy of six types of metrics to estimate SLOC. The used techniques include linear models, non-linear models, rule/tree-based models, and instance-based models. The investigated metrics are class diagram metrics, predictive object points, object-oriented project size metric, fast&&serious class points, objective class points, and object-oriented function points.ResultsBased on 100 open-source Java systems, we find that the prediction model built using object-oriented project size metric and ordinary least square regression with a logarithmic transformation achieves the highest accuracy (mean MMRE = 0.19 and mean Pred(25) = 0.74).ConclusionWe should use object-oriented project size metric and ordinary least square regression with a logarithmic transformation to build a simple, accurate, and comprehensible SLOC estimation model. 相似文献

3.

A hybrid heuristic approach to optimize rule-based software quality estimation models

D. Azar H. Harmanani R. Korkmaz 《Information and Software Technology》2009,51(9):1365-1376

Software quality is defined as the degree to which a software component or system meets specified requirements and specifications. Assessing software quality in the early stages of design and development is crucial as it helps reduce effort, time and money. However, the task is difficult since most software quality characteristics (such as maintainability, reliability and reusability) cannot be directly and objectively measured before the software product is deployed and used for a certain period of time. Nonetheless, these software quality characteristics can be predicted from other measurable software quality attributes such as complexity and inheritance. Many metrics have been proposed for this purpose. In this context, we speak of estimating software quality characteristics from measurable attributes. For this purpose, software quality estimation models have been widely used. These take different forms: statistical models, rule-based models and decision trees. However, data used to build such models is scarce in the domain of software quality. As a result, the accuracy of the built estimation models deteriorates when they are used to predict the quality of new software components. In this paper, we propose a search-based software engineering approach to improve the prediction accuracy of software quality estimation models by adapting them to new unseen software products. The method has been implemented and favorable result comparisons are reported in this work. 相似文献

4.

Considering the Fault Dependency Concept with Debugging Time Lag in Software Reliability Growth Modeling Using a Power Function of Testing Time

V. B. Singh Kalpana Yadav Reecha Kapur V.S.S. Yadavalli 《国际自动化与计算杂志》2007,4(4):359-368

Since the early 1970s tremendous growth has been seen in the research of software reliability growth modeling.In general, software reliability growth models (SRGMs) are applicable to the late stages of testing in software development and they can provide useful information about how to improve the reliability of software products.A number of SRGMs have been proposed in the literature to represent time-dependent fault identification/removal phenomenon;still new models are being proposed that could fit a greater number of reliability growth curves.Often,it is assumed that detected faults axe immediately corrected when mathematical models are developed.This assumption may not be realistic in practice because the time to remove a detected fault depends on the complexity of the fault,the skill and experience of the personnel,the size of the debugging team,the technique,and so on.Thus,the detected fault need not be immediately removed,and it may lag the fault detection process by a delay effect factor.In this paper,we first review how different software reliability growth models have been developed,where fault detection process is dependent not only on the number of residual fault content but also on the testing time,and see how these models can be reinterpreted as the delayed fault detection model by using a delay effect factor.Based on the power function of the testing time concept,we propose four new SRGMs that assume the presence of two types of faults in the software:leading and dependent faults.Leading faults are those that can be removed upon a failure being observed.However,dependent faults are masked by leading faults and can only be removed after the corresponding leading fault has been removed with a debugging time lag.These models have been tested on real software error data to show its goodness of fit,predictive validity and applicability. 相似文献

5.

An investigation of artificial neural networks based prediction systems in software project management

Iris Fabiana de Barcelos Tronto^{Author Vitae} José Demísio Simões da Silva Author Vitae Author Vitae 《Journal of Systems and Software》2008,81(3):356-367

A critical issue in software project management is the accurate estimation of size, effort, resources, cost, and time spent in the development process. Underestimates may lead to time pressures that may compromise full functional development and the software testing process. Likewise, overestimates can result in noncompetitive budgets. In this paper, artificial neural network and stepwise regression based predictive models are investigated, aiming at offering alternative methods for those who do not believe in estimation models. The results presented in this paper compare the performance of both methods and indicate that these techniques are competitive with the APF, SLIM, and COCOMO methods. 相似文献

6.

MND-SCEMP: an empirical study of a software cost estimation modeling process in the defense domain

Taeho Lee Taewan Gu Jongmoon Baik 《Empirical Software Engineering》2014,19(1):213-240

The primary focus of weapon systems research and development has moved from a hardware base to a software base and the cost of software development is increasing gradually. An accurate estimation of the cost of software development is now a very important task in the defense domain. However, existing models and tools for software cost estimation are not suitable for the defense domain due to problems of accuracy. Thus, it is necessary to develop cost estimation models that are appropriate to specific domains. Furthermore, most studies of methodology development are aligned with generic methodologies that do not consider the pertinent factors to specific domains, whereas new methodologies should reflect specific domains. In this study, we apply two generic methodologies to the development of a software cost estimation model, before suggesting an integrated modeling process specifically for the national defense domain. To validate our proposed modeling process, we performed an empirical study of 113 software development projects on weapon systems in Korea. A software cost estimation model was developed by applying the proposed modeling process. The MMRE value of this model was 0.566 while the accuracy was appropriate for use. We conclude that the modeling process and software cost estimation model developed in this study is suitable for estimating resource requirements during weapon system development in South Korea’s national defense domain. This modeling process and model may facilitate more accurate resource estimation by project planners, which will lead to more successful project execution. 相似文献

7.

Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study 总被引：4，自引：0，他引：4

Taghi M. Khoshgoftaar Naeem Seliya 《Empirical Software Engineering》2004,9(3):229-257

Software metrics-based quality classification models predict a software module as either fault-prone (fp) or not fault-prone (nfp). Timely application of such models can assist in directing quality improvement efforts to modules that are likely to be fp during operations, thereby cost-effectively utilizing the software quality testing and enhancement resources. Since several classification techniques are available, a relative comparative study of some commonly used classification techniques can be useful to practitioners. We present a comprehensive evaluation of the relative performances of seven classification techniques and/or tools. These include logistic regression, case-based reasoning, classification and regression trees (CART), tree-based classification with S-PLUS, and the Sprint-Sliq, C4.5, and Treedisc algorithms. The use of expected cost of misclassification (ECM), is introduced as a singular unified measure to compare the performances of different software quality classification models. A function of the costs of the Type I (a nfp module misclassified as fp) and Type II (a fp module misclassified as nfp) misclassifications, ECM is computed for different cost ratios. Evaluating software quality classification models in the presence of varying cost ratios is important, because the usefulness of a model is dependent on the system-specific costs of misclassifications. Moreover, models should be compared and preferred for cost ratios that fall within the range of interest for the given system and project domain. Software metrics were collected from four successive releases of a large legacy telecommunications system. A two-way ANOVA randomized-complete block design modeling approach is used, in which the system release is treated as a block, while the modeling method is treated as a factor. It is observed that predictive performances of the models is significantly different across the system releases, implying that in the software engineering domain prediction models are influenced by the characteristics of the data and the system being modeled. Multiple-pairwise comparisons are performed to evaluate the relative performances of the seven models for the cost ratios of interest to the case study. In addition, the performance of the seven classification techniques is also compared with a classification based on lines of code. The comparative approach presented in this paper can also be applied to other software systems. 相似文献

8.

Evolutionary Sampling and Software Quality Modeling of High-Assurance Systems 总被引：1，自引：0，他引：1

《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2009,39(5):1097-1107

Software quality modeling for high-assurance systems, such as safety-critical systems, is adversely affected by the skewed distribution of fault-prone program modules. This sparsity of defect occurrence within the software system impedes training and performance of software quality estimation models. Data sampling approaches presented in data mining and machine learning literature can be used to address the imbalance problem. We present a novel genetic algorithm-based data sampling method, named Evolutionary Sampling, as a solution to improving software quality modeling for high-assurance systems. The proposed solution is compared with multiple existing data sampling techniques, including random undersampling, one-sided selection, Wilson's editing, random oversampling, cluster-based oversampling, Synthetic Minority Oversampling Technique (SMOTE), and Borderline-SMOTE. This paper involves case studies of two real-world software systems and builds C4.5- and RIPPER-based software quality models both before and after applying a given data sampling technique. It is empirically shown that Evolutionary Sampling improves performance of software quality models for high-assurance systems and is significantly better than most existing data sampling techniques. 相似文献

9.

Using the DEMO methodology for modeling open source software development processes

Philip Huysmans Kris Ven Jan Verelst 《Information and Software Technology》2010,52(6):656-671

相似文献

10.

Modeling the relationship between source code complexity andmaintenance difficulty 总被引：1，自引：0，他引：1

Lanning D.L. Khoshgoftaar T.M. 《Computer》1994,27(9):35-40

Canonical correlation analysis can be a useful exploratory tool for software engineers who want to understand relationships that are not directly observable and who are interested in understanding influences affecting past development efforts. These influences could also affect current development efforts. In this paper, we restrict our findings to one particular development effort. We do not imply that either the weights or the loadings of the relations generalize to all software development efforts. Such generalization is untenable, since the model omitted many important influences on maintenance difficulty. Much work remains to specify subsets of indicators and development efforts for which the technique becomes useful as a predictive tool. Canonical correlation analysis is explained as a restricted form of soft modeling. We chose this approach not only because the terminology and graphical devices of soft modeling allow straightforward high-level explanations, but also because we are interested in the general method. The general method allows models involving many latent variables having interdependencies. It is intended for modeling complex interdisciplinary systems having many variables and little established theory. Further, it incorporates parameter estimation techniques relying on no distributional assumptions. Future research will focus on developing general soft models of the software development process for both exploratory analysis and prediction of future performance 相似文献

11.

Mining software repositories for comprehensible software fault prediction models

Olivier Vandecruys Bart Baesens Christophe Mues 《Journal of Systems and Software》2008,81(5):823-839

Software managers are routinely confronted with software projects that contain errors or inconsistencies and exceed budget and time limits. By mining software repositories with comprehensible data mining techniques, predictive models can be induced that offer software managers the insights they need to tackle these quality and budgeting problems in an efficient way. This paper deals with the role that the Ant Colony Optimization (ACO)-based classification technique AntMiner+ can play as a comprehensible data mining technique to predict erroneous software modules. In an empirical comparison on three real-world public datasets, the rule-based models produced by AntMiner+ are shown to achieve a predictive accuracy that is competitive to that of the models induced by several other included classification techniques, such as C4.5, logistic regression and support vector machines. In addition, we will argue that the intuitiveness and comprehensibility of the AntMiner+ models can be considered superior to the latter models. 相似文献

12.

Hinging hyperplane based regression tree identified by fuzzy clustering and its application

Tamás Kenesei János Abonyi 《Applied Soft Computing》2013,13(2):782-792

Hierarchical fuzzy modeling techniques have great advantage since model accuracy and complexity can be easily controlled thanks to the transparent model structures. A novel tool for regression tree identification is proposed based on the synergistic combination of fuzzy c-regression clustering and the concept of hierarchical modeling. In a special case (c = 2), fuzzy c-regression clustering can be used for identification of hinging hyperplane models. The proposed method recursively identifies a hinging hyperplane model that contains two linear submodels by partitioning operating region of one local linear model resulting a binary regression tree. Novel measures of model performance and complexity are developed to support the analysis and building of the proposed special model structure. Effectiveness of proposed model is demonstrated by benchmark regression datasets. Examples also demonstrate that the proposed model can effectively represent nonlinear dynamical systems. Thanks to the piecewise linear model structure the resulted regression tree can be easily utilized in model predictive control. A detailed application example related to the model predictive control of a water heater demonstrate that the proposed framework can be effectively used in modeling and control of dynamical systems. 相似文献

13.

Analogy-Based Practical Classification Rules for Software Quality Estimation 总被引：3，自引：0，他引：3

Taghi M. Khoshgoftaar Naeem Seliya 《Empirical Software Engineering》2003,8(4):325-350

Software metrics-based quality estimation models can be effective tools for identifying which modules are likely to be fault-prone or not fault-prone. The use of such models prior to system deployment can considerably reduce the likelihood of faults discovered during operations, hence improving system reliability. A software quality classification model is calibrated using metrics from a past release or similar project, and is then applied to modules currently under development. Subsequently, a timely prediction of which modules are likely to have faults can be obtained. However, software quality classification models used in practice may not provide a useful balance between the two misclassification rates, especially when there are very few faulty modules in the system being modeled.This paper presents, in the context of case-based reasoning, two practical classification rules that allow appropriate emphasis on each type of misclassification as per the project requirements. The suggested techniques are especially useful for high-assurance systems where faulty modules are rare. The proposed generalized classification methods emphasize on the costs of misclassifications, and the unbalanced distribution of the faulty program modules. We illustrate the proposed techniques with a case study that consists of software measurements and fault data collected over multiple releases of a large-scale legacy telecommunication system. In addition to investigating the two classification methods, a brief relative comparison of the techniques is also presented. It is indicated that the level of classification accuracy and model-robustness observed for the case study would be beneficial in achieving high software reliability of its subsequent system releases. Similar observations are made from our empirical studies with other case studies. 相似文献

14.

On the impact of disproportional samples in credit scoring models: An application to a Brazilian bank data

Francisco Louzada Paulo H. Ferreira-Silva Carlos A.R. Diniz 《Expert systems with applications》2012,39(9):8071-8078

Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model (Hosmer & Lemeshow, 1989) and a logistic regression with state-dependent sample selection model (Cramer, 2004) applied to simulated data. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian bank portfolio. Our simulation results so far revealed that there is no statistically significant difference in terms of predictive capacity between the naive logistic regression models and the logistic regression with state-dependent sample selection models. However, there is strong difference between the distributions of the estimated default probabilities from these two statistical modeling techniques, with the naive logistic regression models always underestimating such probabilities, particularly in the presence of balanced samples. 相似文献

15.

A probabilistic model for predicting software development effort 总被引：2，自引：0，他引：2

Pendharkar P.C. Subramanian G.H. Rodger J.A. 《IEEE transactions on pattern analysis and machine intelligence》2005,31(7):615-624

Recently, Bayesian probabilistic models have been used for predicting software development effort. One of the reasons for the interest in the use of Bayesian probabilistic models, when compared to traditional point forecast estimation models, is that Bayesian models provide tools for risk estimation and allow decision-makers to combine historical data with subjective expert estimates. In this paper, we use a Bayesian network model and illustrate how a belief updating procedure can be used to incorporate decision-making risks. We develop a causal model from the literature and, using a data set of 33 real-world software projects, we illustrate how decision-making risks can be incorporated in the Bayesian networks. We compare the predictive performance of the Bayesian model with popular nonparametric neural-network and regression tree forecasting models and show that the Bayesian model is a competitive model for forecasting software development effort. 相似文献

16.

A general modeling and analysis framework for software fault detection and correction process

Yu Liu Duo Li Lujia Wang Qingpei Hu 《Software Testing, Verification and Reliability》2016,26(5):351-365

Software reliability growth modeling plays an important role in software reliability evaluation. To incorporate more information and provide more accurate analysis, modeling software fault detection and correction processes has attracted widespread research attention recently. In modeling software correction processes, the assumption of fault correction time is relaxed from constant delay to random delay. However, stochastic distribution of fault correction time brings more difficulties in modeling and corresponding parameter estimation. In this paper, a framework of software reliability models containing both information from software fault detection process and correction process is studied. Different from previous extensions on software reliability growth modeling, the proposed approach is based on Markov model other than a nonhomogeneous Poisson process model. Also, parameter estimation is carried out with weighted least‐square estimation method, which emphasizes the influence of later data on the prediction. Two data sets from practical software development projects are applied with the proposed framework, which shows satisfactory performance with the results. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

17.

The impact of software evolution and reuse on software quality

Taghi M. Khoshgoftaar Edward B. Allen Kalai S. Kalaichelvan Nishith Goel 《Empirical Software Engineering》1996,1(1):31-44

This paper presents a case study of a software project in the maintenance phase. The case study was based on a sample of modules, representing about 1.3 million lines of code, from a very large telecommunications system. Software quality models were developed to predict the number of faults expected from the coding through operations phases. Since modules from the prior release were often reused to develop a new release, one model incorporated reuse data as additional independent variables. We compare this model's performance to a similar model without reuse data.Software quality models often have product metrics as the only input data for predicting quality. There is an implicit assumption that all the modules have had a similar development history, so that product attributes are the primary drivers of different quality levels. Reuse of software as components and software evolution do not fit this assumption very well, and consequently, traditional models for such environments may not have adequate accuracy. Focusing on the software maintenance phase, this study demonstrated that reuse data can significantly improve the predictive accuracy of software quality models. 相似文献

18.

A fuzzy regression and optimization approach for setting target levels in software quality function deployment

Zeynep Sener E. Ertugrul Karsak 《Software Quality Journal》2010,18(3):323-339

With the rapid development of the software industry, improving the quality of software development has gained increasing importance. Software manufacturers have recently applied quality improvement techniques to software development to respond to the needs for software quality. Software quality function deployment (SQFD), as a technique for improving the quality of the software development process to create products responsive to customer expectations, is used to maximize customer satisfaction. This paper presents a fuzzy regression and optimization approach to determine target levels in SQFD. The inherent fuzziness of relationships in SQFD modeling justifies the use of fuzzy regression. Fuzzy regression is used to identify the functional relationships between customer requirements and technical attributes, and among technical attributes. Then, a mathematical programming model is developed to determine target levels of technical attributes using the functional relationships obtained by fuzzy regression. A search engine quality improvement problem is presented to illustrate the application of the proposed approach. 相似文献

19.

基于同态映射的从UML导出可综合Verilog算法

沈筱彦陈杰《计算机科学》2006,33(4):247-249

UML建模因其可显著提高开发效率和代码质量已经成为软件开发领域的一大热点，而硬件设计的日益复杂性也要求我们在更高层次抽象上分析和验证系统行为，故更精细的系统级建模方法变得日趋重要。本文构建了UML元模型与可综合Verilog间的同态映射，定义了一个从UML模型子集导出可综合Verilog描述的算法，为UML模型对于建模硬件系统提供了形式化的语义，从而使运用UML进行硬件系统级建模和系统级上验证系统性能和功能正确性成为可能。相似文献

20.

Mixtures of Gaussian process models for human pose estimation

Martin Fergie Aphrodite Galata 《Image and vision computing》2013

Discriminative human pose estimation is the problem of inferring the 3D articulated pose of a human directly from an image feature. This is a challenging problem due to the highly non-linear and multi-modal mapping from the image feature space to the pose space. To address this problem, we propose a model employing a mixture of Gaussian processes where each Gaussian process models a local region of the pose space. By employing the models in this way we are able to overcome the limitations of Gaussian processes applied to human pose estimation — their O(N³) time complexity and their uni-modal predictive distribution. Our model is able to give a multi-modal predictive distribution where each mode is represented by a different Gaussian process prediction. A logistic regression model is used to give a prior over each expert prediction in a similar fashion to previous mixture of expert models. We show that this technique outperforms existing state of the art regression techniques on human pose estimation data sets for ballet dancing, sign language and the HumanEva data set. 相似文献