共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study 总被引:4,自引:0,他引:4
Software metrics-based quality classification models predict a software module as either fault-prone (fp) or not fault-prone (nfp). Timely application of such models can assist in directing quality improvement efforts to modules that are likely to be fp during operations, thereby cost-effectively utilizing the software quality testing and enhancement resources. Since several classification techniques are available, a relative comparative study of some commonly used classification techniques can be useful to practitioners. We present a comprehensive evaluation of the relative performances of seven classification techniques and/or tools. These include logistic regression, case-based reasoning, classification and regression trees (CART), tree-based classification with S-PLUS, and the Sprint-Sliq, C4.5, and Treedisc algorithms. The use of expected cost of misclassification (ECM), is introduced as a singular unified measure to compare the performances of different software quality classification models. A function of the costs of the Type I (a nfp module misclassified as fp) and Type II (a fp module misclassified as nfp) misclassifications, ECM is computed for different cost ratios. Evaluating software quality classification models in the presence of varying cost ratios is important, because the usefulness of a model is dependent on the system-specific costs of misclassifications. Moreover, models should be compared and preferred for cost ratios that fall within the range of interest for the given system and project domain. Software metrics were collected from four successive releases of a large legacy telecommunications system. A two-way ANOVA randomized-complete block design modeling approach is used, in which the system release is treated as a block, while the modeling method is treated as a factor. It is observed that predictive performances of the models is significantly different across the system releases, implying that in the software engineering domain prediction models are influenced by the characteristics of the data and the system being modeled. Multiple-pairwise comparisons are performed to evaluate the relative performances of the seven models for the cost ratios of interest to the case study. In addition, the performance of the seven classification techniques is also compared with a classification based on lines of code. The comparative approach presented in this paper can also be applied to other software systems. 相似文献
3.
The primary aim of risk-based software quality classification models is to detect, prior to testing or operations, components that are most-likely to be of high-risk. Their practical usage as quality assurance tools is gauged by the prediction-accuracy and cost-effective aspects of the models. Classifying modules into two risk groups is the more commonly practiced trend. Such models assume that all modules predicted as high-risk will be subjected to quality improvements. Due to the always-limited reliability improvement resources and the variability of the quality risk-factor, a more focused classification model may be desired to achieve cost-effective software quality assurance goals. In such cases, calibrating a three-group (high-risk, medium-risk, and low-risk) classification model is more rewarding. We present an innovative method that circumvents the complexities, computational overhead, and difficulties involved in calibrating pure or direct three-group classification models. With the application of the proposed method, practitioners can utilize an existing two-group classification algorithm thrice in order to yield the three risk-based classes. An empirical approach is taken to investigate the effectiveness and validity of the proposed technique. Some commonly used classification techniques are studied to demonstrate the proposed methodology. They include, the C4.5 decision tree algorithm, discriminant analysis, and case-based reasoning. For the first two, we compare the three-group model calibrated using the respective techniques with the one built by applying the proposed method. Any two-group classification technique can be employed by the proposed method, including those that do not provide a direct three-group classification model, e.x., logistic regression and certain binary classification trees, such as CART. Based on a case study of a large-scale industrial software system, it is observed that the proposed method yielded promising results. For a given classification technique, the expected cost of misclassification of the proposed three-group models were significantly better (generally) when compared to the techniques direct three-group model. In addition, the proposed method is also evaluated against an alternate indirect three-group classification method. 相似文献
4.
Accuracy of machine learners is affected by quality of the data the learners are induced on. In this paper, quality of the training dataset is improved by removing instances detected as noisy by the Partitioning Filter. The fit dataset is first split into subsets, and different base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. Two versions of the Partitioning Filter are used: Multiple-Partitioning Filter and Iterative-Partitioning Filter. The number of instances removed by the filters is tuned by the voting scheme of the filter and the number of iterations. The primary aim of this study is to compare the predictive performances of the final models built on the filtered and the un-filtered training datasets. A case study of software measurement data of a high assurance software project is performed. It is shown that predictive performances of models built on the filtered fit datasets and evaluated on a noisy test dataset are generally better than those built on the noisy (un-filtered) fit dataset. However, predictive performance based on certain aggressive filters is affected by presence of noise in the evaluation dataset. 相似文献
5.
The resources allocated for software quality assurance and improvement have not increased with the ever-increasing need for
better software quality. A targeted software quality inspection can detect faulty modules and reduce the number of faults
occurring during operations. We present a software fault prediction modeling approach with case-based reasoning (CBR), a part
of the computational intelligence field focusing on automated reasoning processes. A CBR system functions as a software fault
prediction model by quantifying, for a module under development, the expected number of faults based on similar modules that
were previously developed. Such a system is composed of a similarity function, the number of nearest neighbor cases used for
fault prediction, and a solution algorithm. The selection of a particular similarity function and solution algorithm may affect
the performance accuracy of a CBR-based software fault prediction system. This paper presents an empirical study investigating
the effects of using three different similarity functions and two different solution algorithms on the prediction accuracy
of our CBR system. The influence of varying the number of nearest neighbor cases on the performance accuracy is also explored.
Moreover, the benefits of using metric-selection procedures for our CBR system is also evaluated. Case studies of a large
legacy telecommunications system are used for our analysis. It is observed that the CBR system using the Mahalanobis distance
similarity function and the inverse distance weighted solution algorithm yielded the best fault prediction. In addition, the
CBR models have better performance than models based on multiple linear regression.
Taghi M. Khoshgoftaar is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the
Empirical Software Engineering Laboratory. His research interests are in software engineering, software metrics, software
reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, and statistical
modeling. He has published more than 200 refereed papers in these areas. He has been a principal investigator and project
leader in a number of projects with industry, government, and other research-sponsoring agencies. He is a member of the Association
for Computing Machinery, the IEEE Computer Society, and IEEE Reliability Society. He served as the general chair of the 1999
International Symposium on Software Reliability Engineering (ISSRE’99), and the general chair of the 2001 International Conference
on Engineering of Computer Based Systems. Also, he has served on technical program committees of various international conferences,
symposia, and workshops. He has served as North American editor of the Software Quality Journal, and is on the editorial boards
of the journals Empirical Software Engineering, Software Quality, and Fuzzy Systems.
Naeem Seliya received the M.S. degree in Computer Science from Florida Atlantic University, Boca Raton, FL, USA, in 2001. He is currently
a Ph.D. candidate in the Department of Computer Science and Engineering at Florida Atlantic University. His research interests
include software engineering, computational intelligence, data mining, software measurement, software reliability and quality
engineering, software architecture, computer data security, and network intrusion detection. He is a student member of the
IEEE Computer Society and the Association for Computing Machinery. 相似文献
6.
When building software quality models, the approach often consists of training data mining learners on a single fit dataset.
Typically, this fit dataset contains software metrics collected during a past release of the software project that we want
to predict the quality of. In order to improve the predictive accuracy of such quality models, it is common practice to combine
the predictive results of multiple learners to take advantage of their respective biases. Although multi-learner classifiers
have been proven to be successful in some cases, the improvement is not always significant because the information in the
fit dataset sometimes can be insufficient. We present an innovative method to build software quality models using majority
voting to combine the predictions of multiple learners induced on multiple training datasets. To our knowledge, no previous
study in software quality has attempted to take advantage of multiple software project data repositories which are generally
spread across the organization. In a large scale empirical study involving seven real-world datasets and seventeen learners,
we show that, on average, combining the predictions of one learner trained on multiple datasets significantly improves the
predictive performance compared to one learner induced on a single fit dataset. We also demonstrate empirically that combining
multiple learners trained on a single training dataset does not significantly improve the average predictive accuracy compared
to the use of a single learner induced on a single fit dataset.
Taghi M. Khoshgoftaar is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the Empirical Software Engineering and Data Mining and Machine Learning Laboratories. His research interests are in software engineering, software metrics, software reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, machine learning, and statistical modeling. He has published more than 350 refereed papers in these areas. He is a member of the IEEE, IEEE Computer Society, and IEEE Reliability Society. He was the program chair and general Chair of the IEEE International Conference on Tools with Artificial Intelligence in 2004 and 2005 respectively and is the Program chair of the 20th International Conference on Software Engineering and Knowledge Engineering (2008). He has served on technical program committees of various international conferences, symposia, and workshops. Also, he has served as North American Editor of the Software Quality Journal, and is on the editorial boards of the journals Software Quality and Fuzzy systems. Pierre Rebours received the M.S. degree in Computer Engineering “from Florida Atlantic University, Boca Raton, FL, USA, in April, 2004.” His research interests include quality of data and data mining. Naeem Seliya is an Assistant Professor of Computer and Information Science at the University of Michigan-Dearborn. He received his Ph.D. in Computer Engineering from Florida Atlantic University, Boca Raton, FL, USA in 2005. His research interests include software engineering, data mining and machine learning, software measurement, software reliability and quality engineering, software architecture, computer data security, and network intrusion detection. He is a member of the IEEE and the Association for Computing Machinery. 相似文献
Naeem SeliyaEmail: |
Taghi M. Khoshgoftaar is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the Empirical Software Engineering and Data Mining and Machine Learning Laboratories. His research interests are in software engineering, software metrics, software reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, machine learning, and statistical modeling. He has published more than 350 refereed papers in these areas. He is a member of the IEEE, IEEE Computer Society, and IEEE Reliability Society. He was the program chair and general Chair of the IEEE International Conference on Tools with Artificial Intelligence in 2004 and 2005 respectively and is the Program chair of the 20th International Conference on Software Engineering and Knowledge Engineering (2008). He has served on technical program committees of various international conferences, symposia, and workshops. Also, he has served as North American Editor of the Software Quality Journal, and is on the editorial boards of the journals Software Quality and Fuzzy systems. Pierre Rebours received the M.S. degree in Computer Engineering “from Florida Atlantic University, Boca Raton, FL, USA, in April, 2004.” His research interests include quality of data and data mining. Naeem Seliya is an Assistant Professor of Computer and Information Science at the University of Michigan-Dearborn. He received his Ph.D. in Computer Engineering from Florida Atlantic University, Boca Raton, FL, USA in 2005. His research interests include software engineering, data mining and machine learning, software measurement, software reliability and quality engineering, software architecture, computer data security, and network intrusion detection. He is a member of the IEEE and the Association for Computing Machinery. 相似文献
7.
This paper presents a case study of a software product company that has successfully integrated practices from software product line engineering and agile software development. We show how practices from the two fields support the company’s strategic and tactical ambitions, respectively. We also discuss how the company integrates strategic, tactical and operational processes to optimize collaboration and consequently improve its ability to meet market needs, opportunities and challenges. The findings from this study are relevant to software product companies seeking ways to balance agility and product management. The findings also contribute to research on industrializing software engineering. 相似文献
8.
Taghi M. Khoshgoftaar Robert M. Szabo Timothy G. Woodcock 《Software Quality Journal》1994,3(3):137-151
In this paper, we report the results of a study conducted on a large commercial software system written in assembly language. Unlike studies of the past, our data represent the unit test, integration, and all categories of the maintenance phase: adaptive, perfective, and corrective. The results confirm that faults and change activity are related to software measurements. In addition, we report the relationship between the number of design change requests and software measurements. This new observation has the potential to aid the software engineering management process. Finally, we demonstrate the value of multiple regression models over simple regression models. 相似文献
9.
Keith I. Watson 《Software Quality Journal》1992,1(4):193-208
This paper describes a case study in the use of the COCOMO cost estimation model as a tool to provide an independent prognosis and validation of the schedule of a software project at IBM UK Laboratories Ltd, Hursley. Clearly case studies have the danger of being anecdotal however software engineers often work in situations where sufficient historical data is not available to calibrate models to the local environment. It is often necessary for the software engineer to attempt to use such tools on individual projects to justify their further use. This case study describes how we began to use COCOMO and concentrates on some of the problems and benefits which were encountered when trying to use COCOMO in a live development environment.The paper begins by discussing some problems in mapping the COCOMO phases on to the IBM development process. The practical aspects of gathering the development parameters of the model are described and the results of the work are presented in comparison to a schedule assessment using other prognosis techniques and the planned schedule at other milestones in the project's history. Some difficulties experienced in interpreting the data output from the model are discussed. This is followed by a brief comparison with other schedule analysis techniques used in quality assurance. We hope this case study shows that despite the problems in trying to use models such as COCOMO there are significant benefits in helping the user understand what is required to use such tools more effectively to improve software development cost estimates in the future. 相似文献
10.
Evolving diverse ensembles using genetic programming has recently been proposed for classification problems with unbalanced data. Population diversity is crucial for evolving effective algorithms. Multilevel selection strategies that involve additional colonization and migration operations have shown better performance in some applications. Therefore, in this paper, we are interested in analysing the performance of evolving diverse ensembles using genetic programming for software defect prediction with unbalanced data by using different selection strategies. We use colonization and migration operators along with three ensemble selection strategies for the multi-objective evolutionary algorithm. We compare the performance of the operators for software defect prediction datasets with varying levels of data imbalance. Moreover, to generalize the results, gain a broader view and understand the underlying effects, we replicated the same experiments on UCI datasets, which are often used in the evolutionary computing community. The use of multilevel selection strategies provides reliable results with relatively fast convergence speeds and outperforms the other evolutionary algorithms that are often used in this research area and investigated in this paper. This paper also presented a promising ensemble strategy based on a simple convex hull approach and at the same time it raised the question whether ensemble strategy based on the whole population should also be investigated. 相似文献
11.
This paper presents the results of an empirical study on the subjective evaluation of code smells that identify poorly evolvable
structures in software. We propose use of the term software evolvability to describe the ease of further developing a piece
of software and outline the research area based on four different viewpoints. Furthermore, we describe the differences between
human evaluations and automatic program analysis based on software evolvability metrics. The empirical component is based
on a case study in a Finnish software product company, in which we studied two topics. First, we looked at the effect of the
evaluator when subjectively evaluating the existence of smells in code modules. We found that the use of smells for code evaluation
purposes can be difficult due to conflicting perceptions of different evaluators. However, the demographics of the evaluators
partly explain the variation. Second, we applied selected source code metrics for identifying four smells and compared these
results to the subjective evaluations. The metrics based on automatic program analysis and the human-based smell evaluations
did not fully correlate. Based upon our results, we suggest that organizations should make decisions regarding software evolvability
improvement based on a combination of subjective evaluations and code metrics. Due to the limitations of the study we also
recognize the need for conducting more refined studies and experiments in the area of software evolvability.
相似文献
Casper LasseniusEmail: |
12.
13.
Abstract. An empirical investigation into the validation process within requirements determination is described in which systems analysts were asked to complete a questionnaire concerning important validation issues. We describe the major validation activities, a set of major problems experienced by the respondents, factors affecting the process and hypotheses for problem explanations. The levels of experience of the respondents and the organizations for which they work appear to be significant.
Analysts employ a very traditional approach, expressing the specification mainly in English, and they experience problems in using over-formal notations in informal situations with users, as well as problems in deriving full benefit from notations when building the specification and detecting its properties. Not all of the specification is validated and tool use is not widespread and does not appear to be effective.
We define the concepts of formal and informal view, and suggest that method and tool use will not necessarily increase in organizations as it is apparent that research into the more effective application of formal notations is necessary. In addition, it is clear that the factors that affect the validation process are not only technical, but individual and organizational, necessitating the development of suitable informal activities which take these factors into account. 相似文献
Analysts employ a very traditional approach, expressing the specification mainly in English, and they experience problems in using over-formal notations in informal situations with users, as well as problems in deriving full benefit from notations when building the specification and detecting its properties. Not all of the specification is validated and tool use is not widespread and does not appear to be effective.
We define the concepts of formal and informal view, and suggest that method and tool use will not necessarily increase in organizations as it is apparent that research into the more effective application of formal notations is necessary. In addition, it is clear that the factors that affect the validation process are not only technical, but individual and organizational, necessitating the development of suitable informal activities which take these factors into account. 相似文献
14.
Girish H. Subramanian Parag C. Pendharkar Mary Wallace 《Empirical Software Engineering》2006,11(4):541-553
Several popular cost estimation models like COCOMO and function points use adjustment variables, such as software complexity and platform, to modify original estimates and arrive at final estimates. Using data on 666 programs from 15 software projects, this study empirically tests a research model that studies the influence of three adjustment variables—software complexity, computer platform, and program type (batch or online programs) on software effort. The results confirm that all the three adjustment variables have a significant effect on effort. Further, multiple comparison of means also points to two other results for the data examined. Batch programs involve significantly higher software effort than online programs. Programs rated as complex have significantly higher effort than programs rated as average. 相似文献
15.
ContextThere are many claimed advantages for the use of design patterns and their impact on software quality. However, there is no enough empirical evidence that supports these claimed benefits and some studies have found contrary results.ObjectiveThis empirical study aims to quantitatively measure and compare the fault density of motifs of design patterns in object-oriented systems at different levels: design level, category level, motif level, and role level.MethodAn empirical study was conducted that involved five open-source software systems. Data were analyzed using appropriate statistical test of significance differences.ResultsThere is no consistent difference in fault density between classes that participate in design motifs and non-participant classes. However, classes that participate in structural design motifs tend to be less fault-dense. For creational design motifs, it was found that there is no clear tendency for the difference in fault density. For behavioral design motifs, it was found that there is no significant difference between participant classes and non-participant classes. We observed associations between five design motifs (Builder, Factory Method, Adapter, Composite and Decorator) and fault density. At the role level, we found that only one pair of roles (Adapter vs. Client) shows a significant difference in fault density.ConclusionThere is no clear tendency for the difference in fault density between participant and non-participant classes in design motifs. However, structural design motifs have a negative association with fault density. The Builder design motif has a positive association with fault density whilst the Factory Method, Adapter, Composite, and Decorator design motifs have negative associations with fault density. Classes that participate in the Adapter role are less dense in faults than classes that participate in the Client role. 相似文献
16.
Packages are important high-level organizational units for large object-oriented systems. Package-level metrics characterize the attributes of packages such as size, complexity, and coupling. There is a need for empirical evidence to support the collection of these metrics and using them as early indicators of some important external software quality attributes. In this paper, three suites of package-level metrics (Martin, MOOD and CK) are evaluated and compared empirically in predicting the number of pre-release faults and the number of post-release faults in packages. Eclipse, one of the largest open source systems, is used as a case study. The results indicate that the prediction models that are based on Martin suite are more accurate than those that are based on MOOD and CK suites across releases of Eclipse. 相似文献
17.
Context
Assessing software quality at the early stages of the design and development process is very difficult since most of the software quality characteristics are not directly measurable. Nonetheless, they can be derived from other measurable attributes. For this purpose, software quality prediction models have been extensively used. However, building accurate prediction models is hard due to the lack of data in the domain of software engineering. As a result, the prediction models built on one data set show a significant deterioration of their accuracy when they are used to classify new, unseen data.Objective
The objective of this paper is to present an approach that optimizes the accuracy of software quality predictive models when used to classify new data.Method
This paper presents an adaptive approach that takes already built predictive models and adapts them (one at a time) to new data. We use an ant colony optimization algorithm in the adaptation process. The approach is validated on stability of classes in object-oriented software systems and can easily be used for any other software quality characteristic. It can also be easily extended to work with software quality predictive problems involving more than two classification labels.Results
Results show that our approach out-performs the machine learning algorithm C4.5 as well as random guessing. It also preserves the expressiveness of the models which provide not only the classification label but also guidelines to attain it.Conclusion
Our approach is an adaptive one that can be seen as taking predictive models that have already been built from common domain data and adapting them to context-specific data. This is suitable for the domain of software quality since the data is very scarce and hence predictive models built from one data set is hard to generalize and reuse on new data. 相似文献18.
In this paper, we introduce a knowledge-based meta-model which serves as a unified resource model for integrating characteristics of major types of objects appearing in software development models (SDMs). The URM consists of resource classes and a web of relations that link different types of resources found in different kinds of models of software development. The URM includes specialized models for software models for software systems, documents, agents, tools, and development processes. The URM has served as the basis for integrating and interoperating a number of process-centered CASE environments. The major benefit of the URM is twofold: First, it forms a higher level of abstraction supporting SDM formulation that subsumes many typical models of software development objects. Hence, it enables a higher level of reusability for existing support mechanisms of these models. Second, it provides a basis to support complex reasoning mechanisms that address issues across different types of software objects. To explore these features, we describe the URM both formally and with a detailed example, followed by a characterization of the process of SDM composition, and then by a characterization of the life cycle of activities involved in an overall model formulation process. 相似文献
19.
《Information and Software Technology》2014,56(10):1309-1321
ContextThe way global software development (GSD) activities are managed impacts knowledge transactions between team members. The first is captured in governance decisions, and the latter in a transactive memory system (TMS), a shared cognitive system for encoding, storing and retrieving knowledge between members of a group.ObjectiveWe seek to identify how different governance decisions (such as business strategy, team configuration, task allocation) affect the structure of transactive memory systems as well as the processes developed within those systems.MethodWe use both a quantitative and a qualitative approach. We collect quantitative data through an online survey to identify transactive memory systems. We analyze transactive memory structures using social network analysis techniques and we build a latent variable model to measure transactive memory processes. We further support and triangulate our results by means of interviews, which also help us examine the GSD governance modes of the participating projects. We analyze governance modes, as set of decisions based on three aspects; business strategy, team structure and composition, and task allocation.ResultsOur results suggest that different governance decisions have a different impact on transactive memory systems. Offshore insourcing as a business strategy, for instance, creates tightly-connected clusters, which in turn leads to better developed transactive memory processes. We also find that within the composition and structure of GSD teams, there are boundary spanners (formal or informal) who have a better overview of the network’s activities and become central members within their network. An interesting mapping between task allocation and the composition of the network core suggests that the way tasks are allocated among distributed teams is an indicator of where expertise resides.ConclusionWe present an analytical method to examine GSD governance decisions and their effect on transactive memory systems. Our method can be used from both practitioners and researchers as a “cause and effect” tool for improving collaboration of global software teams. 相似文献
20.
Internet of Things (IoT) is gradually adopted by many organizations to facilitate the information collection and sharing. In an organization, an IoT node usually can receive and send an email for event notification and reminder. However, unwanted and malicious emails are a big security challenge to IoT systems. For example, attackers may intrude a network by sending emails with phishing links. To mitigate this issue, email classification is an important solution with the aim of distinguishing legitimate and spam emails. Artificial intelligence especially machine learning is a major tool for helping detect malicious emails, but the performance might be fluctuant according to specific datasets. The previous research figured out that supervised learning could be acceptable in practice, and that practical evaluation and users' feedback are important. Motivated by these observations, we conduct an empirical study to validate the performance of common learning algorithms under three different environments for email classification. With over 900 users, our study results validate prior observations and indicate that LibSVM and SMO-SVM can achieve better performance than other selected algorithms. 相似文献