首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
王培  金聪  葛贺贺 《计算机应用》2012,32(6):1738-1740
软件开发过程中准确有效地预测具有缺陷倾向的软件模块是提高软件质量的重要方法。属性选择能够显著地提高软件缺陷预测模型的精确度和效率。提出了一种基于互信息的属性选择方法,将选择出的最优属性子集用于软件缺陷预测模型。方法采用了前向搜索策略,并在评价函数中引入非线性平衡系数。实验结果表明,基于互信息的属性选择方法提供的属性子集能提高各类软件缺陷预测模型的预测精度和效率。  相似文献   

2.
Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may be more relevant to the class (defective or non-defective), but others may be redundant or irrelevant. To fully measure the correlation between different features and the class, we present a feature selection approach based on a similarity measure (SM) for software defect prediction. First, the feature weights are updated according to the similarity of samples in different classes. Second, a feature ranking list is generated by sorting the feature weights in descending order, and all feature subsets are selected from the feature ranking list in sequence. Finally, all feature subsets are evaluated on a k-nearest neighbor (KNN) model and measured by an area under curve (AUC) metric for classification performance. The experiments are conducted on 11 National Aeronautics and Space Administration (NASA) datasets, and the results show that our approach performs better than or is comparable to the compared feature selection approaches in terms of classification performance.  相似文献   

3.
Recent feature selection scores using pairwise constraints (must-link and cannot-link) have shown better performances than the unsupervised methods and comparable to the supervised ones. However, these scores use only the pairwise constraints and ignore the available information brought by the unlabeled data. Moreover, these constraint scores strongly depend on the given must-link and cannot-link subsets built by the user. In this paper, we address these problems and propose a new semi-supervised constraint score that uses both pairwise constraints and local properties of the unlabeled data. Experiments using Kendall’s coefficient and accuracy rates, show that this new score is less sensitive to the given constraints than the previous scores while providing similar performances.  相似文献   

4.
New methodologies and tools have gradually made the life cycle for software development more human-independent. Much of the research in this field focuses on defect reduction, defect identification and defect prediction. Defect prediction is a relatively new research area that involves using various methods from artificial intelligence to data mining. Identifying and locating defects in software projects is a difficult task. Measuring software in a continuous and disciplined manner provides many advantages such as the accurate estimation of project costs and schedules as well as improving product and process qualities. This study aims to propose a model to predict the number of defects in the new version of a software product with respect to the previous stable version. The new version may contain changes related to a new feature or a modification in the algorithm or bug fixes. Our proposed model aims to predict the new defects introduced into the new version by analyzing the types of changes in an objective and formal manner as well as considering the lines of code (LOC) change. Defect predictors are helpful tools for both project managers and developers. Accurate predictors may help reducing test times and guide developers towards implementing higher quality codes. Our proposed model can aid software engineers in determining the stability of software before it goes on production. Furthermore, such a model may provide useful insight for understanding the effects of a feature, bug fix or change in the process of defect detection.
Ayşe Basar BenerEmail:
  相似文献   

5.
For the last years, a considerable amount of attention has been devoted to the research about the link prediction (LP) problem in complex networks. This problem tries to predict the likelihood of an association between two not interconnected nodes in a network to appear in the future. One of the most important approaches to the LP problem is based on supervised machine learning (ML) techniques for classification. Although many works have presented promising results with this approach, choosing the set of features (variables) to train the classifiers is still a major challenge. In this article, we report on the effects of three different automatic variable selection strategies (Forward, Backward and Evolutionary) applied to the feature-based supervised learning approach in LP applications. The results of the experiments show that the use of these strategies does lead to better classification models than classifiers built with the complete set of variables. Such experiments were performed over three datasets (Microsoft Academic Network, Amazon and Flickr) that contained more than twenty different features each, including topological and domain-specific ones. We also describe the specification and implementation of the process used to support the experiments. It combines the use of the feature selection strategies, six different classification algorithms (SVM, K-NN, naïve Bayes, CART, random forest and multilayer perceptron) and three evaluation metrics (Precision, F-Measure and Area Under the Curve). Moreover, this process includes a novel ML voting committee inspired approach that suggests sets of features to represent data in LP applications. It mines the log of the experiments in order to identify sets of features frequently selected to produce classification models with high performance. The experiments showed interesting correlations between frequently selected features and datasets.  相似文献   

6.
7.
A number of software cost estimation methods have been presented in literature over the past decades. Analogy based estimation (ABE), which is essentially a case based reasoning (CBR) approach, is one of the most popular techniques. In order to improve the performance of ABE, many previous studies proposed effective approaches to optimize the weights of the project features (feature weighting) in its similarity function. However, ABE is still criticized for the low prediction accuracy, the large memory requirement, and the expensive computation cost. To alleviate these drawbacks, in this paper we propose the project selection technique for ABE (PSABE) which reduces the whole project base into a small subset that consist only of representative projects. Moreover, PSABE is combined with the feature weighting to form FWPSABE for a further improvement of ABE. The proposed methods are validated on four datasets (two real-world sets and two artificial sets) and compared with conventional ABE, feature weighted ABE (FWABE), and machine learning methods. The promising results indicate that project selection technique could significantly improve analogy based models for software cost estimation.  相似文献   

8.
A critique of software defect prediction models   总被引:4,自引:0,他引:4  
Many organizations want to predict the number of defects (faults) in software systems, before they are deployed, to gauge the likely delivered quality and maintenance effort. To help in this numerous software metrics and statistical models have been developed, with a correspondingly large literature. We provide a critical review of this literature and the state-of-the-art. Most of the wide range of prediction models use size and complexity metrics to predict defects. Others are based on testing data, the “quality” of the development process, or take a multivariate approach. The authors of the models have often made heroic contributions to a subject otherwise bereft of empirical studies. However, there are a number of serious theoretical and practical problems in many studies. The models are weak because of their inability to cope with the, as yet, unknown relationship between defects and failures. There are fundamental statistical and data quality problems that undermine model validity. More significantly many prediction models tend to model only part of the underlying problem and seriously misspecify it. To illustrate these points the Goldilock's Conjecture, that there is an optimum module size, is used to show the considerable problems inherent in current defect prediction approaches. Careful and considered analysis of past and new results shows that the conjecture lacks support and that some models are misleading. We recommend holistic models for software defect prediction, using Bayesian belief networks, as alternative approaches to the single-issue models used at present. We also argue for research into a theory of “software decomposition” in order to test hypotheses about defect introduction and help construct a better science of software engineering  相似文献   

9.
Software quality engineering comprises of several quality assurance activities such as testing, formal verification, inspection, fault tolerance, and software fault prediction. Until now, many researchers developed and validated several fault prediction models by using machine learning and statistical techniques. There have been used different kinds of software metrics and diverse feature reduction techniques in order to improve the models’ performance. However, these studies did not investigate the effect of dataset size, metrics set, and feature selection techniques for software fault prediction. This study is focused on the high-performance fault predictors based on machine learning such as Random Forests and the algorithms based on a new computational intelligence approach called Artificial Immune Systems. We used public NASA datasets from the PROMISE repository to make our predictive models repeatable, refutable, and verifiable. The research questions were based on the effects of dataset size, metrics set, and feature selection techniques. In order to answer these questions, there were defined seven test groups. Additionally, nine classifiers were examined for each of the five public NASA datasets. According to this study, Random Forests provides the best prediction performance for large datasets and Naive Bayes is the best prediction algorithm for small datasets in terms of the Area Under Receiver Operating Characteristics Curve (AUC) evaluation parameter. The parallel implementation of Artificial Immune Recognition Systems (AIRS2Parallel) algorithm is the best Artificial Immune Systems paradigm-based algorithm when the method-level metrics are used.  相似文献   

10.
针对软件缺陷预测中数据维度的复杂化和类不平衡问题,提出一种基于代理辅助模型的多目标萤火虫算法(SMO-MSFFA)的软件缺陷预测方法。该方法采用了多组策略萤火虫算法(MSFFA),以最小化数据的特征选择比率和最大化模型评测AUC值为多目标目标函数,分别以随机森林(RF)、支持向量机(SVM)和K近邻分类算法(KNN)为分类器构建软件缺陷预测模型。考虑到进化算法自身的迭代特点,嵌入代理模型离线完成部分个体评价函数的计算,以缩短计算耗时。在公开数据集NASA中的PC1、KC1和MC1项目上进行实验验证,与NSGA-II方法相比,在项目PC1、KC1和MC1上模型AUC均值分别提升0.17、降低0.01和提升0.09,平均特征选择比率分别降低0.08,0.17和0.05,平均耗时分别增加131 s,降低了199 s和降低了431 s。实验结果表明,提出的方法在提高模型性能、降低特征选择比率和缩短计算耗时方面具有明显的优势。  相似文献   

11.
Over the last few years research has been oriented toward developing a machine vision system for locating and identifying, automatically, defects on rails. Rail defects exhibit different properties and are divided into various categories related to the type and position of flaws on the rail. Several kinds of interrelated factors cause rail defects such as type of rail, construction conditions, and speed and/or frequency of trains using the rail. The aim of this paper is to present an experimental comparison among three filtering approaches, based on texture analysis of rail surfaces, to detect the presence/absence of a particular class of surface defects: corrugation.Received: 7 April 2002, Accepted: 13 April 2004, Published online: 13 July 2004  相似文献   

12.
为了提高预测模型的性能,解决不同属性子集带来的分歧,提出了基本偏相关方法的预测模型。首先,该方法在公开数据集上分析出代码静态属性与缺陷数之间存在偏相关关系;然后基于偏相关系数值,计算出代码复杂性度密度属性值;最后基于该属性值建立新的缺陷预测模型。实验表明,该模型具有较高的召回率和很好的F-measure性能,从而进一步证实了代码属性与模块缺陷之间的偏相关性是影响软件质量预测性能的重要因素的结论。该结论有助于建立更加稳定可靠的软件缺陷预测模型。  相似文献   

13.

Context

Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company data. It is difficult to employ these models which are built on the within-company data in practice, because of the lack of these local data repositories. Recently, transfer learning has attracted more and more attention for building classifier in target domain using the data from related source domain. It is very useful in cases when distributions of training and test instances differ, but is it appropriate for cross-company software defect prediction?

Objective

In this paper, we consider the cross-company defect prediction scenario where source and target data are drawn from different companies. In order to harness cross company data, we try to exploit the transfer learning method to build faster and highly effective prediction model.

Method

Unlike the prior works selecting training data which are similar from the test data, we proposed a novel algorithm called Transfer Naive Bayes (TNB), by using the information of all the proper features in training data. Our solution estimates the distribution of the test data, and transfers cross-company data information into the weights of the training data. On these weighted data, the defect prediction model is built.

Results

This article presents a theoretical analysis for the comparative methods, and shows the experiment results on the data sets from different organizations. It indicates that TNB is more accurate in terms of AUC (The area under the receiver operating characteristic curve), within less runtime than the state of the art methods.

Conclusion

It is concluded that when there are too few local training data to train good classifiers, the useful knowledge from different-distribution training data on feature level may help. We are optimistic that our transfer learning method can guide optimal resource allocation strategies, which may reduce software testing cost and increase effectiveness of software testing process.  相似文献   

14.
Multimedia Tools and Applications - Sentiment analysis is a domain of study that focuses on identifying and classifying the ideas expressed in the form of text into positive, negative and neutral...  相似文献   

15.

Software Product Line (SPL) customizes software by combining various existing features of the software with multiple variants. The main challenge is selecting valid features considering the constraints of the feature model. To solve this challenge, a hybrid approach is proposed to optimize the feature selection problem in software product lines. The Hybrid approach ‘Hyper-PSOBBO’ is a combination of Particle Swarm Optimization (PSO), Biogeography-Based Optimization (BBO) and hyper-heuristic algorithms. The proposed algorithm has been compared with Bird Swarm Algorithm (BSA), PSO, BBO, Firefly, Genetic Algorithm (GA) and Hyper-heuristic. All these algorithms are performed in a set of 10 feature models that vary from a small set of 100 to a high-quality data set of 5000. The detailed empirical analysis in terms of performance has been carried out on these feature models. The results of the study indicate that the performance of the proposed method is higher to other state-of-the-art algorithms.

  相似文献   

16.
This article tackles the problem of predicting effort (in person–hours) required to fix a software defect posted on an Issue Tracking System. The proposed method is inspired by the Nearest Neighbour Approach presented by the pioneering work of Weiss et al. (2007) [1]. We propose four enhancements to Weiss et al. (2007) [1]: Data Enrichment, Majority Voting, Adaptive Threshold and Binary Clustering. Data Enrichment infuses additional issue information into the similarity-scoring procedure, aiming to increase the accuracy of similarity scores. Majority Voting exploits the fact that many of the similar historical issues have repeating effort values, which are close to the actual. Adaptive Threshold automatically adjusts the similarity threshold to ensure that we obtain only the most similar matches. We use Binary Clustering if the similarity scores are very low, which might result in misleading predictions. This uses common properties of issues to form clusters (independent of the similarity scores) which are then used to produce the predictions. Numerical results are presented showing a noticeable improvement over the method proposed in Weiss et al. (2007) [1].  相似文献   

17.
Software defect prediction aims to predict the defect proneness of new software modules with the historical defect data so as to improve the quality of a software system. Software historical defect data has a complicated structure and a marked characteristic of class-imbalance; how to fully analyze and utilize the existing historical defect data and build more precise and effective classifiers has attracted considerable researchers’ interest from both academia and industry. Multiple kernel learning and ensemble learning are effective techniques in the field of machine learning. Multiple kernel learning can map the historical defect data to a higher-dimensional feature space and make them express better, and ensemble learning can use a series of weak classifiers to reduce the bias generated by the majority class and obtain better predictive performance. In this paper, we propose to use the multiple kernel learning to predict software defect. By using the characteristics of the metrics mined from the open source software, we get a multiple kernel classifier through ensemble learning method, which has the advantages of both multiple kernel learning and ensemble learning. We thus propose a multiple kernel ensemble learning (MKEL) approach for software defect classification and prediction. Considering the cost of risk in software defect prediction, we design a new sample weight vector updating strategy to reduce the cost of risk caused by misclassifying defective modules as non-defective ones. We employ the widely used NASA MDP datasets as test data to evaluate the performance of all compared methods; experimental results show that MKEL outperforms several representative state-of-the-art defect prediction methods.  相似文献   

18.
Software cost estimation is one of the most crucial activities in software development process. In the past decades, many methods have been proposed for cost estimation. Case based reasoning (CBR) is one of these techniques. Feature selection is an important preprocessing stage of case based reasoning. Most existing feature selection methods of case based reasoning are ‘wrappers’ which can usually yield high fitting accuracy at the cost of high computational complexity and low explanation of the selected features. In our study, the mutual information based feature selection (MICBR) is proposed. This approach hybrids both ‘wrapper’ and ‘filter’ mechanism which is another kind of feature selector with much lower complexity than wrappers, and the features selected by filters are likely to be generalized to other conditions. The MICBR is then compared with popular feature selectors and the published works. The results show that the MICBR is an effective feature selector for case based reasoning by overcoming some of the limitations and computational complexities of other feature selection techniques in the field.  相似文献   

19.
提出心衰死亡率预测系统,预测心衰病人本次住院后30天内死亡率。基于上海曙光医院提供的心衰病人信息,首先对原始数据和特征进行预处理。由于特征的冗余性,再选用经典的Relief特征选择算法筛选出重要的心衰特征,最后选用bp-SVM算法来实现死亡率预测。实验结果证明,死亡率预测系统可以达到较高的性能并通过提供决策信息,辅助医生治疗病人。医生可以根据系统预测的病人死亡率的高低,采取不同的治疗方式,提高临床诊断结果和医院的资源分配。  相似文献   

20.
Engineering with Computers - One of the important factors during drilling times is the rate of penetration (ROP), which is controlled based on different variables. Factors affecting different...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号