首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
软件缺陷预测通过预先识别出被测项目内的潜在缺陷程序模块,可以优化测试资源的分配并提高软件产品的质量。论文对跨项目缺陷预测问题展开了深入研究,在源项目实例选择时,考虑了三种不同的实例相似度计算方法,并发现这些方法的缺陷预测结果存在多样性,因此提出了一种基于Box-Cox转换的集成跨项目软件缺陷预测方法BCEL,具体来说,首先基于不同的实例相似度计算方法,从候选集中选出不同的训练集,随后针对这些数据集,进行针对性的Box-Cox转化,并借助特定分类方法构造出不同的基分类器,最后将这三个基分类器进行有效集成。基于实际项目的数据集,验证了BCEL方法的有效性,并深入分析了BCEL方法内的影响因素对缺陷预测性能的影响。  相似文献   

2.
何吉元  孟昭鹏  陈翔  王赞  樊向宇 《软件学报》2017,28(6):1455-1473
软件缺陷预测方法可以在项目的开发初期,通过预先识别出所有可能含有缺陷的软件模块来优化测试资源的分配。早期的缺陷预测研究大多集中于同项目缺陷预测,但同项目缺陷预测需要充足的历史数据,而在实际应用中可能需要预测的项目的历史数据较为稀缺,或这个项目是一个全新项目。因此跨项目缺陷预测问题成为当前软件缺陷预测领域内的一个研究热点,其研究挑战在于源项目与目标项目数据集间存在的分布差异性以及数据集内存在的类不平衡问题。受到基于搜索的软件工程思想的启发,论文提出了一种基于搜索的半监督集成跨项目软件缺陷预测方法S3EL。该方法首先通过调整训练集中各类数据的分布比例,构建出多个朴素贝叶斯基分类器,随后利用具有全局搜索能力的遗传算法,基于少量已标记目标实例对上述基分类器进行集成,并构建出最终的缺陷预测模型。在Promise数据集及AEEEM数据集上和多个经典的跨项目缺陷预测方法(Burak过滤法、Peters过滤法、TCA+、CODEP及HYDRA)进行了对比。以F1值作为评测指标,结果表明在大部分情况下,S3EL方法可以取得最好的预测性能。  相似文献   

3.
安全缺陷报告可以描述软件产品中的安全关键漏洞.为了消除软件产品的安全攻击风险,安全缺陷报告(security bug report, SBR)预测越来越受到研究人员的关注.但在实际软件开发场景中,需要进行软件安全漏洞预测的项目可能是来自新公司或属于新启动的项目,没有足够的已标记安全缺陷报告供在实践中构建此软件安全漏洞预测模型.一种简单的解决方案就是使用迁移模型,即利用其他项目已经标记过的数据来构建预测模型.受到该领域最近的两项研究工作的启发,以安全关键字过滤为思路提出一种融合知识图谱的跨项目安全缺陷报告预测方法KG-SBRP (knowledge graph of security bug report prediction).使用安全缺陷报告中的文本信息域结合CWE(common weakness enumeration)与CVE Details (common vulnerabilities and exposures)共同构建三元组规则实体,以三元组规则实体构建安全漏洞知识图谱,在图谱中结合实体及其关系识别安全缺陷报告.将数据分为训练集和测试集进行模型拟合和性能评估.所构建的模型...  相似文献   

4.
To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on the past history of classes, methods, or certain other code elements. These techniques are effective provided that a sufficient amount of data is available to train a prediction model. However, sufficient training data are rarely available for new software projects. To resolve this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, was proposed and is regarded as a new challenge in the area of defect prediction. Thus far, only a few cross-project defect prediction techniques have been proposed. To advance the state of the art, in this study, we investigated seven composite algorithms that integrate multiple machine learning classifiers to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we performed experiments on 10 open-source software systems from the PROMISE repository, which contain a total of 5,305 instances labeled as defective or clean. We compared the composite algorithms with the combined defect predictor where logistic regression is used as the meta classification algorithm (CODEP Logistic ), which is the most recent cross-project defect prediction algorithm in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experimental results show that several algorithms outperform CODEP Logistic : Maximum voting shows the best performance in terms of F-measure and its average F-measure is superior to that of CODEP Logistic by 36.88%. Bootstrap aggregation (BaggingJ48) shows the best performance in terms of cost effectiveness and its average cost effectiveness is superior to that of CODEP Logistic by 15.34%.  相似文献   

5.
Just-in-time defect prediction can remind software developers and managers to verify and fix bugs at the moment they appeared, thus improving the effectiveness and validity of bug fixing. Existing studies mainly focus on just-in-time prediction for software files (JIT-F). JIT-F is a binary classification problem, which classifies (hence predicts) a file change as buggy or clean. This article provides a detailed analysis of just-in-time defect prediction for software hunks (JIT-H), which predicts bugs at a finer level of granularity, and hence further improves the efficiency of bug fixing. Classification is performed using the ensemble technique of bagging—aggregated combinations of random under sampling plus multiple classifiers (J48 and Random Forest). An empirical study with 10 open source projects was conducted to validate the effectiveness of JIT-H. Experimental results show that JIT-H is effective at predicting defects in software hunk changes. Compared with JIT-F, JIT-H is more cost effective. Additionally, analysis on the change features indicates that Text Vector features and hunk change level features are of more importance than features in other groups and levels.  相似文献   

6.
在跨项目软件缺陷预测中,源项目与目标项目的特征关联度与实例分布差异性是影响预测模型性能的主要因素。本文从特征过滤与实例迁移2个角度出发,提出一种跨项目软件缺陷预测框架KCF-KMM(K-medoids Cluster Filtering- Kernel Mean Matching)。在特征过滤阶段,该方法基于K-medoids聚类算法来筛选特征子集,过滤与目标项目关联度低的特征。在实例迁移阶段,通过KMM算法计算源项目与目标项目实例间的分布差异度,以此分配每个训练实例的影响权重。最后,结合目标项目中少量有标注数据建立混合缺陷预测模型。为了验证KCF-KMM的有效性,本文从准确率和F1值的角度出发,分别与经典的跨项目软件缺陷预测方法TCA+、TNB和NNFilter相比,KCF-KMM的预测性能在Apache数据集上可以分别提升34.1%、0.8%、21.1%和14.4%、3.7%、10.6%。  相似文献   

7.
唐艳武  蒋凡 《计算机工程》2010,36(17):51-53,56
针对自动提取软件漏洞模式方法对漏洞模式的描述不太精确的问题,提出利用软件的程序依赖图表示上下文相关的软件漏洞模式。通过从软件开发历史库中提取相邻版本进行自动比对,生成与修补对应的依赖子图表示漏洞模式。对开源软件Apache代码库进行实验,获得316个有效漏洞模式,利用它们在Apache2.2.8中查找并进行人工分析,确认得到13处未知可疑漏洞,证明了该方法的有效性。  相似文献   

8.
Complexity, cohesion and coupling have been recognized as prominent indicators of software quality. One characterization of software complexity is the existence of dependency relationships. Moreover, the degree of dependency reflects the cohesion and coupling between software elements. Dependencies in the design and implementation phase have been proven to be important predictors of software bugs. We empirically investigated how requirements dependencies correlate with and predict software integration bugs, which can provide early estimates regarding software quality and thus facilitate decision making early in the software lifecycle. We conducted network analysis on the requirements dependency networks of three commercial software projects. Significant correlation is observed between most of our network measures and the number of bugs. Furthermore, many network measures demonstrate significantly greater values for higher severity (or a higher fixing workload). Afterward, we built bug prediction models using these network measures and found that bugs can be predicted with high accuracy and sensitivity, even in cross-project and cross-company contexts. We further identified the dependency type that contributes most to bug correlation, as well as the network measures that contribute more to bug prediction. These observations show that the requirements dependency network can be used as an early indicator and predictor of software integration bugs.  相似文献   

9.
Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learning classification. In particular, we study the state-of-the-art ensemble learner Stacked Generalization (SG) that combines several classifiers. We collect more than 50,000 bug reports from five development projects from two companies in different domains. We implement automated bug assignment and evaluate the performance in a set of controlled experiments. We show that SG scales to large scale industrial application and that it outperforms the use of individual classifiers for bug assignment, reaching prediction accuracies from 50 % to 89 % when large training sets are used. In addition, we show how old training data can decrease the prediction accuracy of bug assignment. We advice industry to use SG for bug assignment in proprietary contexts, using at least 2,000 bug reports for training. Finally, we highlight the importance of not solely relying on results from cross-validation when evaluating automated bug assignment.  相似文献   

10.
Empirical studies indicate that automating the bug assignment process has the potential to significantly reduce software evolution effort and costs. Prior work has used machine learning techniques to automate bug assignment but has employed a narrow band of tools which can be ineffective in large, long-lived software projects. To redress this situation, in this paper we employ a comprehensive set of machine learning tools and a probabilistic graph-based model (bug tossing graphs) that lead to highly-accurate predictions, and lay the foundation for the next generation of machine learning-based bug assignment. Our work is the first to examine the impact of multiple machine learning dimensions (classifiers, attributes, and training history) along with bug tossing graphs on prediction accuracy in bug assignment. We validate our approach on Mozilla and Eclipse, covering 856,259 bug reports and 21 cumulative years of development. We demonstrate that our techniques can achieve up to 86.09% prediction accuracy in bug assignment and significantly reduce tossing path lengths. We show that for our data sets the Naïve Bayes classifier coupled with product–component features, tossing graphs and incremental learning performs best. Next, we perform an ablative analysis by unilaterally varying classifiers, features, and learning model to show their relative importance of on bug assignment accuracy. Finally, we propose optimization techniques that achieve high prediction accuracy while reducing training and prediction time.  相似文献   

11.
With the development of smartphones, mobile applications play an irreplaceable role in our daily life, which characteristics often commit code changes to meet new requirements. This characteristic can introduce defects into the software. To provide immediate feedback to developers, previous researchers began to focus on just-in-time (JIT) software defect prediction techniques. JIT defect prediction aims to determine whether code commits will introduce defects into the software. It contains two scenarios, within-project JIT defect prediction and cross-project JIT defect prediction. Regardless of whether within-project JIT defect prediction or cross-project JIT defect prediction all need to have enough labeled data (within-project JIT defect prediction assumes that have plenty of labeled data from the same project, while cross-project JIT defect prediction assumes that have sufficient labeled data from source projects). However, in practice, both the source and target projects may only have limited labeled data. We propose the MTL-DNN method based on multi-task learning to solve this question. This method contains the data preprocessing layer, input layer, shared layers, task-specific layers, and output layer. Where the common features of multiple related tasks are learned by sharing layers, and the unique features of each task are learned by the task-specific layers. For verifying the effectiveness of the MTL-DNN approach, we evaluate our method on 15 Android mobile apps. The experimental results show that our method significantly outperforms the state-of-the-art single-task deep learning and classical machine learning methods. This result shows that the MTL-DNN method can effectively solve the problem of insufficient labeled training data for source and target projects.  相似文献   

12.
软件缺陷预测是提高软件测试效率、保证软件可靠性的重要途径,已经成为目前实证软件工程领域的研究热点。在软件工程中,软件的开发过程或技术平台可能随时变化,特别是遇到新项目启动或旧项目重新开发时,基于目标项目数据的传统软件缺陷预测方法无法满足实践需求。基于迁移学习技术采用其他项目中已经标注的软件数据实现跨项目的缺陷预测,可以有效解决传统方法的不足,引起了国内外研究者的极大关注,并取得了一系列的研究成果。首先总结了跨项目软件缺陷预测中的关键问题。然后根据迁移学习的技术特点将现有方法分为基于软件属性特征迁移和软件模块实例迁移两大类,并分析比较了常见方法的特点和不足。最后探讨了跨项目软件缺陷预测未来的发展方向。  相似文献   

13.
陈睿  杨孟飞  郭向英 《软件学报》2016,27(3):547-561
在航天嵌入式软件等中断驱动型软件中,中断数据竞争问题十分突出.然而中断在并发语义、同步机制、调度机制等方面与线程(任务)有诸多不同,具有Ad-hoc特征,难以统一刻画,因此主流的数据竞争检测方法并不适用.以航天嵌入式软件数据竞争案例库为基础进行了系统分析,提出刻画有害中断数据竞争的7种缺陷模式.针对其中最常见且最难解决的单变量访问序模式,基于抽象解释提出一种支持过程间分析、中断并发分析的高效检测方法.设计并实现了相应的检测工具SpaceDRC.实验表明,SpaceDRC能够在145毫秒内检测出约21400行程序中的真实数据竞争.SpaceDRC已经在多个航天重点型号中进行了应用,使得中断数据竞争专项分析的效率提高了至少5倍,并且降低了问题遗漏率.  相似文献   

14.
原子  于莉莉  刘超 《软件学报》2014,25(11):2499-2517
软件在其生命周期中不断地发生变更,以适应需求和环境的变化。为了及时预测每次变更是否引入了缺陷,研究者们提出了面向软件源代码变更的缺陷预测方法。然而现有方法存在以下3点不足:(1)仅实现了较粗粒度(事务级和源文件级变更)的预测;(2)仅采用向量空间模型表征变更,没有充分挖掘蕴藏在软件库中的程序结构、自然语言语义以及历史等信息;(3)仅探讨较短时间范围内的预测,未考虑在长时间软件演化过程中由于新需求或人员重组等外界因素所带来的概念漂移问题。针对现有的不足,提出一种面向源代码变更的缺陷预测方法。该方法将细粒度(语句级)变更作为预测对象,从而有效降低了质量保证成本;采用程序静态分析和自然语言语义主题推断相结合的技术深入挖掘软件库,从变更的上下文、内容、时间以及人员4个方面构建特征集,从而揭示了变更易于引入缺陷的因素;采用特征熵差值矩阵分析了软件演化过程中概念漂移问题的特点,并通过一种伴随概念回顾的动态窗口学习机制实现了长时间的稳定预测。通过6个著名开源软件验证了该方法的有效性。  相似文献   

15.
Software defect prediction aims to predict the defect proneness of new software modules with the historical defect data so as to improve the quality of a software system. Software historical defect data has a complicated structure and a marked characteristic of class-imbalance; how to fully analyze and utilize the existing historical defect data and build more precise and effective classifiers has attracted considerable researchers’ interest from both academia and industry. Multiple kernel learning and ensemble learning are effective techniques in the field of machine learning. Multiple kernel learning can map the historical defect data to a higher-dimensional feature space and make them express better, and ensemble learning can use a series of weak classifiers to reduce the bias generated by the majority class and obtain better predictive performance. In this paper, we propose to use the multiple kernel learning to predict software defect. By using the characteristics of the metrics mined from the open source software, we get a multiple kernel classifier through ensemble learning method, which has the advantages of both multiple kernel learning and ensemble learning. We thus propose a multiple kernel ensemble learning (MKEL) approach for software defect classification and prediction. Considering the cost of risk in software defect prediction, we design a new sample weight vector updating strategy to reduce the cost of risk caused by misclassifying defective modules as non-defective ones. We employ the widely used NASA MDP datasets as test data to evaluate the performance of all compared methods; experimental results show that MKEL outperforms several representative state-of-the-art defect prediction methods.  相似文献   

16.
Network measures are useful for predicting fault-prone modules. However, existing work has not distinguished faults according to their severity. In practice, high severity faults cause serious problems and require further attention. In this study, we explored the utility of network measures in high severity faultproneness prediction. We constructed software source code networks for four open-source projects by extracting the dependencies between modules. We then used univariate logistic regression to investigate the associations between each network measure and fault-proneness at a high severity level. We built multivariate prediction models to examine their explanatory ability for fault-proneness, as well as evaluated their predictive effectiveness compared to code metrics under forward-release and cross-project predictions. The results revealed the following: (1) most network measures are significantly related to high severity fault-proneness; (2) network measures generally have comparable explanatory abilities and predictive powers to those of code metrics; and (3) network measures are very unstable for cross-project predictions. These results indicate that network measures are of practical value in high severity fault-proneness prediction.  相似文献   

17.
传统的基于向量空间模型的软件缺陷分派方法,由于存在特征空间维度高、数据稀疏且包含噪音等问题,分派准确率较低。为此,提出一种基于隐含狄利克雷分配(LDA)主题模型的软件缺陷分派方法,将缺陷报告从原始的高维文本单词空间映射到低维语义主题空间,在新的低维主题空间上进行分派。实验结果表明,在使用SVM和KNN分类器时,该方法的分派准确率较高。  相似文献   

18.
针对跨项目软件缺陷预测过程中,软件缺陷数据存在无关信息或数据冗余等问题,提出融合多策略特征筛选的跨项目软件缺陷预测(cross-project software defect prediction based on Multi-Policy Feature Filtering,MPFF)方法。采用多策略筛选方法与过采样方法进行数据预处理;使用代价敏感的域自适应方法进行分类,分类过程使用少量已标记目标项目数据改善项目间分布差异;在AEEEM、NASA MDP及SOFTLAB数据集上进行了不同度量下预测实验。实验结果表明,在同构度量下MPFF方法相比Burank filter、Peters filter、TCA+和TrAdaBoost方法预测效果最佳。  相似文献   

19.
Software metrics rarely follow a normal distribution. Therefore, software metrics are usually transformed prior to building a defect prediction model. To the best of our knowledge, the impact that the transformation has on cross-project defect prediction models has not been thoroughly explored. A cross-project model is built from one project and applied on another project. In this study, we investigate if cross-project defect prediction is affected by applying different transformations (i.e., log and rank transformations, as well as the Box-Cox transformation). The Box-Cox transformation subsumes log and other power transformations (e.g., square root), but has not been studied in the defect prediction literature. We propose an approach, namely Multiple Transformations (MT), to utilize multiple transformations for cross-project defect prediction. We further propose an enhanced approach MT+ to use the parameter of the Box-Cox transformation to determine the most appropriate training project for each target project. Our experiments are conducted upon three publicly available data sets (i.e., AEEEM, ReLink, and PROMISE). Comparing to the random forest model built solely using the log transformation, our MT+ approach improves the F-measure by 7, 59 and 43% for the three data sets, respectively. As a summary, our major contributions are three-fold: 1) conduct an empirical study on the impact that data transformation has on cross-project defect prediction models; 2) propose an approach to utilize the various information retained by applying different transformation methods; and 3) propose an unsupervised approach to select the most appropriate training project for each target project.  相似文献   

20.
软件缺陷预测可帮助开发人员提前预测缺陷程序,合理分配有限的测试资源。软件缺陷预测的准确度不仅依赖于预测方法的选择,更依赖于软件的度量指标。因此,结合多元度量指标进行软件缺陷预测已成为当前的研究热点。从度量指标出发,对传统度量指标、多元度量指标以及结合多元度量指标的缺陷预测的研究进展进行了系统介绍。主要工作包含:介绍了传统的代码和过程度量指标、基于传统度量指标的软件缺陷预测模型以及影响数据质量的因素;阐述了语义结构度量指标;分析列举了当前用于软件缺陷预测的评价指标;结合预测粒度、传统度量指标、语义结构度量指标、跨项目软件缺陷预测对多元度量指标软件缺陷预测未来的研究趋势进行了展望。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号