期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Software quality estimation with limited fault data: a semi-supervised learning perspective

Naeem Seliya Taghi M. Khoshgoftaar 《Software Quality Journal》2007,15(3):327-344

We addresses the important problem of software quality analysis when there is limited software fault or fault-proneness data. A software quality model is typically trained using software measurement and fault data obtained from a previous release or similar project. Such an approach assumes that fault data is available for all the training modules. Various issues in software development may limit the availability of fault-proneness data for all the training modules. Consequently, the available labeled training dataset is such that the trained software quality model may not provide predictions. More specifically, the small set of modules with known fault-proneness labels is not sufficient for capturing the software quality trends of the project. We investigate semi-supervised learning with the Expectation Maximization (EM) algorithm for software quality estimation with limited fault-proneness data. The hypothesis is that knowledge stored in software attributes of the unlabeled program modules will aid in improving software quality estimation. Software data collected from a large NASA software project is used during the semi-supervised learning process. The software quality model is evaluated with multiple test datasets collected from other NASA software projects. Compared to software quality models trained only with the available set of labeled program modules, the EM-based semi-supervised learning scheme improves generalization performance of the software quality models. 相似文献

2.

Analyzing software measurement data with clustering techniques 总被引：1，自引：0，他引：1

Zhong S. Khoshgoftaar T.M. Seliya N. 《Intelligent Systems, IEEE》2004,19(2):20-27

For software quality estimation, software development practitioners typically construct quality-classification or fault prediction models using software metrics and fault data from a previous system release or a similar software project. Engineers then use these models to predict the fault proneness of software modules in development. Software quality estimation using supervised-learning approaches is difficult without software fault measurement data from similar projects or earlier system releases. Cluster analysis with expert input is a viable unsupervised-learning solution for predicting software modules' fault proneness and potential noisy modules. Data analysts and software engineering experts can collaborate more closely to construct and collect more informative software metrics. 相似文献

3.

Initialization independent clustering with actively self-training method

Nie F Xu D Li X 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2012,42(1):17-27

The results of traditional clustering methods are usually unreliable as there is not any guidance from the data labels, while the class labels can be predicted more reliable by the semisupervised learning if the labels of partial data are given. In this paper, we propose an actively self-training clustering method, in which the samples are actively selected as training set to minimize an estimated Bayes error, and then explore semisupervised learning to perform clustering. Traditional graph-based semisupervised learning methods are not convenient to estimate the Bayes error; we develop a specific regularization framework on graph to perform semisupervised learning, in which the Bayes error can be effectively estimated. In addition, the proposed clustering algorithm can be readily applied in a semisupervised setting with partial class labels. Experimental results on toy data and real-world data sets demonstrate the effectiveness of the proposed clustering method on the unsupervised and the semisupervised setting. It is worthy noting that the proposed clustering method is free of initialization, while traditional clustering methods are usually dependent on initialization. 相似文献

4.

半监督软件缺陷挖掘研究综述 总被引：3，自引：0，他引：3

黎铭霍轩《数据采集与处理》2016,31(1):56-64

软件质量是计算机系统安全可靠运行的保障,而软件缺陷是导致软件质量低下的重要诱因。软件缺陷挖掘技术凭借其能够通过对软件代码及其相关数据进行分析建模,发现软件系统潜在的缺陷,已得到了软件质量保障领域的广泛关注。要准确发现软件模块中潜在的缺陷,需要利用大量带有缺陷情况标注的模块进行学习。然而,缺陷情况标注往往需要通过详细测试或人工代码检查获取,要消耗大量测试和人工资源,在实际应用中难以满足,这严重制约了软件缺陷挖掘的性能。针对这一问题,半监督学习技术被引入软件缺陷挖掘,通过对大量缺少标注的模块进行利用,辅助提升软件缺陷挖掘的性能。本文对半监督缺陷挖掘技术的研究现状进行综述。首先综述了软件缺陷挖掘研究现状,然后简要介绍了半监督学习的4种学习范式;最后系统梳理了基于半监督学习进行软件缺陷挖掘的多种方法与技术。相似文献

5.

Analogy-Based Practical Classification Rules for Software Quality Estimation 总被引：3，自引：0，他引：3

Taghi M. Khoshgoftaar Naeem Seliya 《Empirical Software Engineering》2003,8(4):325-350

Software metrics-based quality estimation models can be effective tools for identifying which modules are likely to be fault-prone or not fault-prone. The use of such models prior to system deployment can considerably reduce the likelihood of faults discovered during operations, hence improving system reliability. A software quality classification model is calibrated using metrics from a past release or similar project, and is then applied to modules currently under development. Subsequently, a timely prediction of which modules are likely to have faults can be obtained. However, software quality classification models used in practice may not provide a useful balance between the two misclassification rates, especially when there are very few faulty modules in the system being modeled.This paper presents, in the context of case-based reasoning, two practical classification rules that allow appropriate emphasis on each type of misclassification as per the project requirements. The suggested techniques are especially useful for high-assurance systems where faulty modules are rare. The proposed generalized classification methods emphasize on the costs of misclassifications, and the unbalanced distribution of the faulty program modules. We illustrate the proposed techniques with a case study that consists of software measurements and fault data collected over multiple releases of a large-scale legacy telecommunication system. In addition to investigating the two classification methods, a brief relative comparison of the techniques is also presented. It is indicated that the level of classification accuracy and model-robustness observed for the case study would be beneficial in achieving high software reliability of its subsequent system releases. Similar observations are made from our empirical studies with other case studies. 相似文献

6.

基于Box-Cox转换的集成跨项目软件缺陷预测方法

王莉萍陈翔王秋萍赵英全《计算机应用研究》2017,34(7)

软件缺陷预测通过预先识别出被测项目内的潜在缺陷程序模块,可以优化测试资源的分配并提高软件产品的质量。论文对跨项目缺陷预测问题展开了深入研究,在源项目实例选择时,考虑了三种不同的实例相似度计算方法,并发现这些方法的缺陷预测结果存在多样性,因此提出了一种基于Box-Cox转换的集成跨项目软件缺陷预测方法BCEL,具体来说,首先基于不同的实例相似度计算方法,从候选集中选出不同的训练集,随后针对这些数据集,进行针对性的Box-Cox转化,并借助特定分类方法构造出不同的基分类器,最后将这三个基分类器进行有效集成。基于实际项目的数据集,验证了BCEL方法的有效性,并深入分析了BCEL方法内的影响因素对缺陷预测性能的影响。相似文献

7.

Evolutionary Sampling and Software Quality Modeling of High-Assurance Systems 总被引：1，自引：0，他引：1

《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2009,39(5):1097-1107

Software quality modeling for high-assurance systems, such as safety-critical systems, is adversely affected by the skewed distribution of fault-prone program modules. This sparsity of defect occurrence within the software system impedes training and performance of software quality estimation models. Data sampling approaches presented in data mining and machine learning literature can be used to address the imbalance problem. We present a novel genetic algorithm-based data sampling method, named Evolutionary Sampling, as a solution to improving software quality modeling for high-assurance systems. The proposed solution is compared with multiple existing data sampling techniques, including random undersampling, one-sided selection, Wilson's editing, random oversampling, cluster-based oversampling, Synthetic Minority Oversampling Technique (SMOTE), and Borderline-SMOTE. This paper involves case studies of two real-world software systems and builds C4.5- and RIPPER-based software quality models both before and after applying a given data sampling technique. It is empirically shown that Evolutionary Sampling improves performance of software quality models for high-assurance systems and is significantly better than most existing data sampling techniques. 相似文献

8.

Software measurement: Problems and practice 总被引：1，自引：0，他引：1

John C. Munson 《Annals of Software Engineering》1995,1(1):255-285

Software measurement is a key element in the evolving software engineering discipline. This paper seeks to identify some of the principal problems surrounding the measurement process and the metrics themselves. Metrics are shown in various modeling contexts. Measures of static software complexity are explored together with measures of dynamic program complexity. These data are then used to identify regions of software that are fault prone. This information is then exploited to develop a model of dynamic program complexity for the identification of failure prone software. Empirical experiences with the practice of measurement from an ongoing research project on the Space Shuttle Primary Avionics Software System are employed to augment the discussion. 相似文献

9.

Thresholds based outlier detection approach for mining class outliers: An empirical case study on software measurement datasets

Oral Alan Cagatay Catal 《Expert systems with applications》2011,38(4):3440-3445

Predicting the fault-proneness labels of software program modules is an emerging software quality assurance activity and the quality of datasets collected from previous software version affects the performance of fault prediction models. In this paper, we propose an outlier detection approach using metrics thresholds and class labels to identify class outliers. We evaluate our approach on public NASA datasets from PROMISE repository. Experiments reveal that this novel outlier detection method improves the performance of robust software fault prediction models based on Naive Bayes and Random Forests machine learning algorithms. 相似文献

10.

基于深度自编码网络的软件缺陷预测方法

周末徐玲杨梦宁廖胜平鄢萌《计算机工程与科学》2018,40(10):1796-1804

软件缺陷预测是提升软件质量的有效方法,而软件缺陷预测方法的预测效果与数据集自身的特点有着密切的相关性。针对软件缺陷预测中数据集特征信息冗余、维度过大的问题,结合深度学习对数据特征强大的学习能力,提出了一种基于深度自编码网络的软件缺陷预测方法。该方法首先使用一种基于无监督学习的采样方法对6个开源项目数据集进行采样,解决了数据集中类不平衡问题;然后训练出一个深度自编码网络模型。该模型能对数据集进行特征降维,模型的最后使用了三种分类器进行连接,该模型使用降维后的训练集训练分类器,最后用测试集进行预测。实验结果表明,该方法在维数较大、特征信息冗余的数据集上的预测性能要优于基准的软件缺陷预测模型和基于现有的特征提取方法的软件缺陷预测模型,并且适用于不同分类算法。相似文献

11.

An investigation on the feasibility of cross-project defect prediction

Zhimin He Fengdi Shu Ye Yang Mingshu Li Qing Wang 《Automated Software Engineering》2012,19(2):167-199

Software defect prediction helps to optimize testing resources allocation by identifying defect-prone modules prior to testing. Most existing models build their prediction capability based on a set of historical data, presumably from the same or similar project settings as those under prediction. However, such historical data is not always available in practice. One potential way of predicting defects in projects without historical data is to learn predictors from data of other projects. This paper investigates defect predictions in the cross-project context focusing on the selection of training data. We conduct three large-scale experiments on 34 data sets obtained from 10 open source projects. Major conclusions from our experiments include: (1) in the best cases, training data from other projects can provide better prediction results than training data from the same project; (2) the prediction results obtained using training data from other projects meet our criteria for acceptance on the average level, defects in 18 out of 34 cases were predicted at a Recall greater than 70% and a Precision greater than 50%; (3) results of cross-project defect predictions are related with the distributional characteristics of data sets which are valuable for training data selection. We further propose an approach to automatically select suitable training data for projects without historical data. Prediction results provided by the training data selected by using our approach are comparable with those provided by training data from the same project. 相似文献

12.

An empirical study of predicting software faults with case-based reasoning 总被引：1，自引：0，他引：1

Taghi M. Khoshgoftaar Naeem Seliya Nandini Sundaresh 《Software Quality Journal》2006,14(2):85-111

The resources allocated for software quality assurance and improvement have not increased with the ever-increasing need for better software quality. A targeted software quality inspection can detect faulty modules and reduce the number of faults occurring during operations. We present a software fault prediction modeling approach with case-based reasoning (CBR), a part of the computational intelligence field focusing on automated reasoning processes. A CBR system functions as a software fault prediction model by quantifying, for a module under development, the expected number of faults based on similar modules that were previously developed. Such a system is composed of a similarity function, the number of nearest neighbor cases used for fault prediction, and a solution algorithm. The selection of a particular similarity function and solution algorithm may affect the performance accuracy of a CBR-based software fault prediction system. This paper presents an empirical study investigating the effects of using three different similarity functions and two different solution algorithms on the prediction accuracy of our CBR system. The influence of varying the number of nearest neighbor cases on the performance accuracy is also explored. Moreover, the benefits of using metric-selection procedures for our CBR system is also evaluated. Case studies of a large legacy telecommunications system are used for our analysis. It is observed that the CBR system using the Mahalanobis distance similarity function and the inverse distance weighted solution algorithm yielded the best fault prediction. In addition, the CBR models have better performance than models based on multiple linear regression. Taghi M. Khoshgoftaar is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the Empirical Software Engineering Laboratory. His research interests are in software engineering, software metrics, software reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, and statistical modeling. He has published more than 200 refereed papers in these areas. He has been a principal investigator and project leader in a number of projects with industry, government, and other research-sponsoring agencies. He is a member of the Association for Computing Machinery, the IEEE Computer Society, and IEEE Reliability Society. He served as the general chair of the 1999 International Symposium on Software Reliability Engineering (ISSRE’99), and the general chair of the 2001 International Conference on Engineering of Computer Based Systems. Also, he has served on technical program committees of various international conferences, symposia, and workshops. He has served as North American editor of the Software Quality Journal, and is on the editorial boards of the journals Empirical Software Engineering, Software Quality, and Fuzzy Systems. Naeem Seliya received the M.S. degree in Computer Science from Florida Atlantic University, Boca Raton, FL, USA, in 2001. He is currently a Ph.D. candidate in the Department of Computer Science and Engineering at Florida Atlantic University. His research interests include software engineering, computational intelligence, data mining, software measurement, software reliability and quality engineering, software architecture, computer data security, and network intrusion detection. He is a student member of the IEEE Computer Society and the Association for Computing Machinery. 相似文献

13.

Accuracy of software quality models over multiple releases

Taghi M. Khoshgoftaar Edward B. Allen Wendell D. Jones John P. Hudepohl 《Annals of Software Engineering》2000,9(1-2):103-116

Many evolving mission‐critical systems must have high software reliability. However, it is often difficult to identify fault‐prone modules early enough in a development cycle to guide software enhancement efforts effectively and efficiently. Software quality models can yield timely predictions of membership in the fault‐prone class on a module‐by‐module basis, enabling one to target enhancement techniques. However, it is an open empirical question, “Can a software quality model remain useful over several releases?” Most prior software quality studies have examined only one release of a system, evaluating the model with modules from the same release. We conducted a case study of a large legacy telecommunications system where measurements on one software release were used to build models, and three subsequent releases of the same system were used to evaluate model accuracy. This is a realistic assessment of model accuracy, closely simulating actual use of a software quality model. A module was considered fault‐prone if any of its faults were discovered by customers. These faults are extremely expensive due to consequent loss of service and emergency repair efforts. We found that the model maintained useful accuracy over several releases. These findings are initial empirical evidence that software quality models can remain useful as a system is maintained by a stable software development process. 相似文献

14.

Sample-based software defect prediction with active and semi-supervised learning

Ming Li Hongyu Zhang Rongxin Wu Zhi-Hua Zhou 《Automated Software Engineering》2012,19(2):201-230

Software defect prediction can help us better understand and control software quality. Current defect prediction techniques are mainly based on a sufficient amount of historical project data. However, historical data is often not available for new projects and for many organizations. In this case, effective defect prediction is difficult to achieve. To address this problem, we propose sample-based methods for software defect prediction. For a large software system, we can select and test a small percentage of modules, and then build a defect prediction model to predict defect-proneness of the rest of the modules. In this paper, we describe three methods for selecting a sample: random sampling with conventional machine learners, random sampling with a semi-supervised learner and active sampling with active semi-supervised learner. To facilitate the active sampling, we propose a novel active semi-supervised learning method ACoForest which is able to sample the modules that are most helpful for learning a good prediction model. Our experiments on PROMISE datasets show that the proposed methods are effective and have potential to be applied to industrial practice. 相似文献

15.

The impact of software evolution and reuse on software quality

Taghi M. Khoshgoftaar Edward B. Allen Kalai S. Kalaichelvan Nishith Goel 《Empirical Software Engineering》1996,1(1):31-44

This paper presents a case study of a software project in the maintenance phase. The case study was based on a sample of modules, representing about 1.3 million lines of code, from a very large telecommunications system. Software quality models were developed to predict the number of faults expected from the coding through operations phases. Since modules from the prior release were often reused to develop a new release, one model incorporated reuse data as additional independent variables. We compare this model's performance to a similar model without reuse data.Software quality models often have product metrics as the only input data for predicting quality. There is an implicit assumption that all the modules have had a similar development history, so that product attributes are the primary drivers of different quality levels. Reuse of software as components and software evolution do not fit this assumption very well, and consequently, traditional models for such environments may not have adequate accuracy. Focusing on the software maintenance phase, this study demonstrated that reuse data can significantly improve the predictive accuracy of software quality models. 相似文献

16.

Non‐negative sparse‐based SemiBoost for software defect prediction

Tiejian Wang Zhiwu Zhang Xiaoyuan Jing Yanli Liu 《Software Testing, Verification and Reliability》2016,26(7):498-515

Software defect prediction is an important decision support activity in software quality assurance. The limitation of the labelled modules usually makes the prediction difficult, and the class‐imbalance characteristic of software defect data leads to negative influence on decision of classifiers. Semi‐supervised learning can build high‐performance classifiers by using large amount of unlabelled modules together with the labelled modules. Ensemble learning achieves a better prediction capability for class‐imbalance data by using a series of weak classifiers to reduce the bias generated by the majority class. In this paper, we propose a new semi‐supervised software defect prediction approach, non‐negative sparse‐based SemiBoost learning. The approach is capable of exploiting both labelled and unlabelled data and is formulated in a boosting framework. In order to enhance the prediction ability, we design a flexible non‐negative sparse similarity matrix, which can fully exploit the similarity of historical data by incorporating the non‐negativity constraint into sparse learning for better learning the latent clustering relationship among software modules. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that non‐negative sparse‐based SemiBoost learning outperforms several representative state‐of‐the‐art semi‐supervised software defect prediction methods. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

17.

Software Defect Detection with R<Emphasis Type="SmallCaps">ocus</Emphasis>

下载免费PDF全文

姜远黎铭周志华《计算机科学技术学报》2011,26(2):328-342

Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system.Although many machine learning methods have been successfully applied to the task,most of them fail to consider two practical yet important issues in software defect detection.First,it is rather difficult to collect a large amount of labeled training data for learning a well-performing model;second,in a software system there are usually much fewer defective modules than defect-free modules,so learning would have to be conducted over an imbalanced data set.In this paper,we address these two practical issues simultaneously by proposing a novel semi-supervised learning approach named Rocus.This method exploits the abundant unlabeled examples to improve the detection accuracy,as well as employs under-sampling to tackle the class-imbalance problem in the learning process.Experimental results of real-world software defect detection tasks show that Rocus is effective for software defect detection.Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data. 相似文献

18.

面向软件缺陷数据聚类分析的数据降维处理方法

万琳范秋灵《计算机系统应用》2015,24(3):207-213

面向软件缺陷数据的聚类分析就是按照一定的准则将不同的软件缺陷数据对象划分为多个类,使得类内的缺陷数据相似,类间的缺陷数据相异,其意义在于发现软件缺陷的分布规律,有针对性地制定测试方案,优化测试过程.针对传统K-Means方法聚类结果依赖样本初始空间分布的问题,提出一种基于PSO算法的数据降维处理方法 DRPS.仿真实验表明,经过该方法降维处理后数据的聚类准确率及聚类质量都有了一定程度的提高. 相似文献

19.

软件规模估算方法的研究

秦放李孝贵王立娟《计算机与数字工程》2012,40(5):18-19,22

要软件规模估算是整个项目的基础,是项目成功的核心。目前流行的软件规模估算方法主要有：专家小组判断法、类比法、基于代理的估算法。该文对以上三种方法进行了初步探究,并在此基础上进一步分析了每种方法的优缺点及适用情况,最后提出了软件规模估算方法下一步可能的发展方向。相似文献

20.

基于改进模糊C均值的软件缺陷预测研究

张焯徐玲杨丹《计算机工程与应用》2015,51(7):136-140

软件缺陷预测用来预测软件系统各个模块中是否存在BUG。传统的软件缺陷预测技术研究主要局限在有监督方法上,这类方法需要大量的已标注数据进行训练,但在工程实际中,这类标签数据不易获取。提出了一种结合模拟退火和遗传算法的改进模糊C均值算法,以解决模糊C均值容易受初始聚类中心影响而收敛到局部最优的缺陷。实验结果表明提出的方法在软件缺陷预测中具备高鲁棒性和较高预测精度。相似文献