期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刁旭炀吴凯陈都周俊峰高璞《计算机测量与控制》2023,31(3):56-62

软件缺陷预测技术用于定位软件中可能存在缺陷的代码模块,从而辅助开发人员进行测试与修复。传统的软件缺陷特征为基于软件规模、复杂度和语言特点等人工提取的静态度量元信息。然而,静态度量元特征无法直接捕捉程序上下文中的缺陷信息,从而影响了软件缺陷预测的性能。为了充分利用程序上下文中的语法语义信息,论文提出了一种基于混合注意力机制的软件缺陷预测方法 DP-MHA(Defect Prediction via Mixed Attention Mechanism)。DP-MHA首先从程序模块中提取基于AST树的语法语义序列并进行词嵌入编码和位置编码,然后基于多头注意力机制自学习上下文语法语义信息,最后利用全局注意力机制提取关键的语法语义特征,用于构建软件缺陷预测模型并识别存在潜在缺陷的代码模块。为了验证DP-MHA的有效性,论文选取了六个Apache的开源Java数据集,与经典的基于RF的静态度量元方法、基于RBM+RF、DBN+RF无监督学习方法和基于CNN和RNN深度学习方法进行对比,实验结果表明,DP-MHA在F1值分别提升了16.6%、34.3%、26.4%、7.1%、4.9%。相似文献

2.

基于多源域适应和数据增强的跨项目开源软件缺陷预测

下载免费PDF全文

李光杰唐艺何焱张启磊邢颖赵梦赐《智能安全》2024,3(1):62-73

通过挖掘软件代码仓库数据预测软件缺陷是提高软件质量和增强软件安全性的重要方法。人们提出了多种基于机器学习的方法挖掘软件代码仓缺陷数据预测软件缺陷。然而,由于从不同代码仓提取的软件缺陷数据具有异质性,因此机器学习的预测效果往往并不理想。为此,本文提出一种基于多源域适应和数据增强的缺陷预测方法。该方法通过挖掘各种源代码仓和目标代码仓之间的特征相似性提高预测的准确性：一方面利用带权重的最大平均方差使特征分布距离最小,另一方面利用注意力机制提高与目标代码仓高度相似的源代码仓权重。对比实验结果表明,本文所提方法在软件缺陷预测效果最佳。相似文献

3.

软件缺陷预测技术研究进展

宫丽娜姜淑娟姜丽《软件学报》2019,30(10):3090-3114

随着软件规模的扩大和复杂度的不断提高,软件的质量问题成为关注的焦点,软件缺陷是软件质量的对立面,威胁着软件质量,如何在软件开发的早期挖掘出缺陷模块成为一个亟需解决的问题.软件缺陷预测通过挖掘软件历史仓库,设计出与缺陷相关的内在度量元,然后借助机器学习等方法来提前发现与锁定缺陷模块,从而合理地分配有限的资源.因此,软件缺陷预测是软件质量保证的重要途径之一,近年来已成为软件工程中一个非常重要的研究课题.汇总近8年（2010年~2017年）国内外的缺陷预测技术的研究成果,并以缺陷预测的形式为主线进行分析,首先介绍了软件缺陷预测模型的框架;然后从软件缺陷数据集、构建模型的方法及评价指标这3个方面对已有的研究工作进行分类归纳和比较;最后探讨了软件缺陷预测的未来可能的研究方向、机遇和挑战. 相似文献

4.

航空机载软件缺陷知识库框架

张贺王世海刘斌杨顺昆余正伟秦蕾《测控技术》2013,32(1):99-103

提高航空机载软件质量成为当前一个亟须解决的问题.建立软件缺陷知识库对于进行有效的软件质量评价及软件故障预测,识别易于出现缺陷的软件模块,提高软件测试效率和软件质量,都能起到重要作用.提出了一个基于机器学习和产生式系统推理相结合的航空机载软件缺陷知识库构建方法和相应的框架,该框架还包含软件缺陷度量元选取标准、选取清单,以及缺陷信息统计要求、分析方法.在此框架的基础上,利用实际测评工作中积累的大量航空机载软件缺陷数据,构建了一个统一、规范的软件缺陷知识库,并通过该知识库给出了缺陷预防信息,从而对航空机载软件全寿命周期进行了有效指导. 相似文献

5.

静态软件缺陷预测方法研究 总被引：7，自引：7，他引：7

陈翔顾庆刘望舒刘树龙倪超《软件学报》2016,27(1):1-25

静态软件缺陷预测是软件工程数据挖掘领域中的一个研究热点.通过分析软件代码或开发过程,设计出与软件缺陷相关的度量元;随后,通过挖掘软件历史仓库来创建缺陷预测数据集,旨在构建出缺陷预测模型,以预测出被测项目内的潜在缺陷程序模块,最终达到优化测试资源分配和提高软件产品质量的目的.对近些年来国内外学者在该研究领域取得的成果进行了系统的总结.首先,给出了研究框架并识别出了影响缺陷预测性能的3个重要影响因素:度量元的设定、缺陷预测模型的构建方法和缺陷预测数据集的相关问题;接着,依次总结了这3个影响因素的已有研究成果;随后,总结了一类特殊的软件缺陷预测问题(即,基于代码修改的缺陷预测)的已有研究工作;最后,对未来研究可能面临的挑战进行了展望. 相似文献

6.

软件缺陷数据处理研究综述 总被引：3，自引：0，他引：3

李宁李战怀《计算机科学》2009,36(8):21-25

软件缺陷数据是软件质量分析和改进的重要基础数据之一.如何在分析缺陷数据前对缺陷数据进行有效的预处理,如何根据缺陷特征对缺陷数据进行合理分类,如何对缺陷数据进行挖掘以及统计分析,是软件缺陷研究领域面临的问题.详细介绍了缺陷数据预处理、缺陷分类以及缺陷数据挖掘分析3个方面的研究内容、方法和技术,并对这些方法进行了比较和分析,最后提出了几个软件缺陷数据处理研究领域需要进一步研究的问题. 相似文献

7.

基于代码自然性的切片粒度缺陷预测方法

张献贲可荣曾杰《软件学报》2021,32(7):2219-2241

软件缺陷预测是软件质量保障领域的一个活跃话题,它可以帮助开发人员发现潜在的缺陷并更好地利用资源.如何为预测系统设计更具判别力的度量元,并兼顾性能与可解释性,一直是人们致力于研究的方向.针对这一挑战,提出了一种基于代码自然性特征的缺陷预测方法——CNDePor.该方法通过正逆双向度量代码并利用质量信息对样本加权的方式改进... 相似文献

8.

基于改进V模型的软件测试过程研究

申晓彦郭佳旭曹春芳杨薇姚素娟王霞邢璐张晔《数字社区&智能家居》2021,(9)

软件测试过程在软件开发中发挥着重要作用,传统的基于V模型的软件测试过程比较滞后,在软件代码完成之后才开始测试活动,当发现软件缺陷时需要付出较大的代价。该文提出了一种改进的软件测试过程,将测试过程中的确定测试需求、制定测试计划、设计测试用例三个环节,与软件需求分析、软件概要设计、软件详细设计活动并行开展,有助于在软件开发早期发现潜在的缺陷,能够有效提升软件质量、缩短软件开发周期、降低软件开发成本。相似文献

9.

静态软件缺陷预测方法研究

田笑常继友张弛荣景峰王子昱张光华王鹤伍高飞胡敬炉张玉清《计算机研究与发展》2023,111(7):1467-1488

开源软件缺陷预测通过挖掘软件历史仓库的数据,利用与软件缺陷相关的度量元或源代码本身的语法语义特征,借助机器学习或深度学习方法提前发现软件缺陷,从而减少软件修复成本并提高产品质量. 漏洞预测则通过挖掘软件实例存储库来提取和标记代码模块,预测新的代码实例是否含有漏洞,减少漏洞发现和修复的成本. 通过对2000年至2022年12月软件缺陷预测研究领域的相关文献调研,以机器学习和深度学习为切入点,梳理了基于软件度量和基于语法语义的预测模型. 基于这2类模型,分析了软件缺陷预测和漏洞预测之间的区别和联系,并针对数据集来源与处理、代码向量的表征方法、预训练模型的提高、深度学习模型的探索、细粒度预测技术、软件缺陷预测和漏洞预测模型迁移六大前沿热点问题进行了详尽分析,最后指出了软件缺陷预测未来的发展方向.

相似文献

10.

基于Bayes潜在语义模型的半监督Web挖掘 总被引：26，自引：0，他引：26

宫秀军史忠植《软件学报》2002,13(8):1508-1514

随着互联网信息的增长,Web挖掘已经成为数据挖掘研究的热点之一.网页分类是通过学习大量的带有类别标注的训练样本来预测网页的类别,人工标注这些训练样本是相当繁琐的.网页聚类通过一定的相似性度量,将相关网页归并到一类.然而传统的聚类算法对解空间的搜索带有盲目性和缺乏语义特征.提出了两阶段的半监督文本学习策略.第1阶段,利用贝叶斯潜在语义模型来标注含有潜在类别主题词变量的网页的类别;第2阶段,利用简单贝叶斯模型,在第1阶段类别标注的基础上,通过EM(expectation maximization)算法对不含有潜在类别主题词变量的文档作类别标注.实验结果表明,该算法具有很高的精度和召回率. 相似文献

11.

Software quality estimation with limited fault data: a semi-supervised learning perspective

Naeem Seliya Taghi M. Khoshgoftaar 《Software Quality Journal》2007,15(3):327-344

We addresses the important problem of software quality analysis when there is limited software fault or fault-proneness data. A software quality model is typically trained using software measurement and fault data obtained from a previous release or similar project. Such an approach assumes that fault data is available for all the training modules. Various issues in software development may limit the availability of fault-proneness data for all the training modules. Consequently, the available labeled training dataset is such that the trained software quality model may not provide predictions. More specifically, the small set of modules with known fault-proneness labels is not sufficient for capturing the software quality trends of the project. We investigate semi-supervised learning with the Expectation Maximization (EM) algorithm for software quality estimation with limited fault-proneness data. The hypothesis is that knowledge stored in software attributes of the unlabeled program modules will aid in improving software quality estimation. Software data collected from a large NASA software project is used during the semi-supervised learning process. The software quality model is evaluated with multiple test datasets collected from other NASA software projects. Compared to software quality models trained only with the available set of labeled program modules, the EM-based semi-supervised learning scheme improves generalization performance of the software quality models. 相似文献

12.

基于采样的半监督支持向量机软件缺陷预测方法

廖胜平徐玲鄢萌《计算机工程与应用》2017,53(14):161-166

软件缺陷预测有助于提高软件开发质量,保证测试资源有效分配。针对软件缺陷预测研究中类标签数据难以获取和类不平衡分布问题,提出基于采样的半监督支持向量机预测模型。该模型采用无监督的采样技术,确保带标签样本数据中缺陷样本数量不会过低,使用半监督支持向量机方法,在少量带标签样本数据基础上利用无标签数据信息构建预测模型;使用公开的NASA软件缺陷预测数据集进行仿真实验。实验结果表明提出的方法与现有半监督方法相比,在综合评价指标[F]值和召回率上均优于现有方法;与有监督方法相比,能在学习样本较少的情况下取得相当的预测性能。相似文献

13.

Software Defect Detection with R<Emphasis Type="SmallCaps">ocus</Emphasis>

下载免费PDF全文

姜远黎铭周志华《计算机科学技术学报》2011,26(2):328-342

Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system.Although many machine learning methods have been successfully applied to the task,most of them fail to consider two practical yet important issues in software defect detection.First,it is rather difficult to collect a large amount of labeled training data for learning a well-performing model;second,in a software system there are usually much fewer defective modules than defect-free modules,so learning would have to be conducted over an imbalanced data set.In this paper,we address these two practical issues simultaneously by proposing a novel semi-supervised learning approach named Rocus.This method exploits the abundant unlabeled examples to improve the detection accuracy,as well as employs under-sampling to tackle the class-imbalance problem in the learning process.Experimental results of real-world software defect detection tasks show that Rocus is effective for software defect detection.Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data. 相似文献

14.

Sample-based software defect prediction with active and semi-supervised learning

Ming Li Hongyu Zhang Rongxin Wu Zhi-Hua Zhou 《Automated Software Engineering》2012,19(2):201-230

Software defect prediction can help us better understand and control software quality. Current defect prediction techniques are mainly based on a sufficient amount of historical project data. However, historical data is often not available for new projects and for many organizations. In this case, effective defect prediction is difficult to achieve. To address this problem, we propose sample-based methods for software defect prediction. For a large software system, we can select and test a small percentage of modules, and then build a defect prediction model to predict defect-proneness of the rest of the modules. In this paper, we describe three methods for selecting a sample: random sampling with conventional machine learners, random sampling with a semi-supervised learner and active sampling with active semi-supervised learner. To facilitate the active sampling, we propose a novel active semi-supervised learning method ACoForest which is able to sample the modules that are most helpful for learning a good prediction model. Our experiments on PROMISE datasets show that the proposed methods are effective and have potential to be applied to industrial practice. 相似文献

15.

Non‐negative sparse‐based SemiBoost for software defect prediction

Tiejian Wang Zhiwu Zhang Xiaoyuan Jing Yanli Liu 《Software Testing, Verification and Reliability》2016,26(7):498-515

Software defect prediction is an important decision support activity in software quality assurance. The limitation of the labelled modules usually makes the prediction difficult, and the class‐imbalance characteristic of software defect data leads to negative influence on decision of classifiers. Semi‐supervised learning can build high‐performance classifiers by using large amount of unlabelled modules together with the labelled modules. Ensemble learning achieves a better prediction capability for class‐imbalance data by using a series of weak classifiers to reduce the bias generated by the majority class. In this paper, we propose a new semi‐supervised software defect prediction approach, non‐negative sparse‐based SemiBoost learning. The approach is capable of exploiting both labelled and unlabelled data and is formulated in a boosting framework. In order to enhance the prediction ability, we design a flexible non‐negative sparse similarity matrix, which can fully exploit the similarity of historical data by incorporating the non‐negativity constraint into sparse learning for better learning the latent clustering relationship among software modules. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that non‐negative sparse‐based SemiBoost learning outperforms several representative state‐of‐the‐art semi‐supervised software defect prediction methods. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

16.

基于二次学习的半监督字典学习软件缺陷预测^*

张志武荆晓远吴飞《模式识别与人工智能》2017,30(3):242-250

当软件历史仓库中有标记训练样本较少时,有效的预测模型难以构建.针对此问题,文中提出基于二次学习的半监督字典学习软件缺陷预测方法.在第一阶段的学习中,利用稀疏表示分类器将大量无标记样本通过概率软标记标注扩充至有标记训练样本集中.再在扩充后的训练集上进行第二阶段的鉴别字典学习,最后在学得的字典上预测缺陷倾向性.在NASA MDP和PROMISE AR数据集上的实验验证文中方法的优越性. 相似文献

17.

Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering

Naeem Seliya Taghi M. Khoshgoftaar 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2007,37(2):201-211

Software quality assurance is a vital component of software project development. A software quality estimation model is trained using software measurement and defect (software quality) data of a previously developed release or similar project. Such an approach assumes that the development organization has experience with systems similar to the current project and that defect data are available for all modules in the training data. In software engineering practice, however, various practical issues limit the availability of defect data for modules in the training data. In addition, the organization may not have experience developing a similar system. In such cases, the task of software quality estimation or labeling modules as fault prone or not fault prone falls on the expert. We propose a semisupervised clustering scheme for software quality analysis of program modules with no defect data or quality-based class labels. It is a constraint-based semisupervised clustering scheme that uses k-means as the underlying clustering algorithm. Software measurement data sets obtained from multiple National Aeronautics and Space Administration software projects are used in our empirical investigation. The proposed technique is shown to aid the expert in making better estimations as compared to predictions made when the expert labels the clusters formed by an unsupervised learning algorithm. In addition, the software quality knowledge learnt during the semisupervised process provided good generalization performance for multiple test data sets. An analysis of program modules that remain unlabeled subsequent to our semisupervised clustering scheme provided useful insight into the characteristics of their software attributes 相似文献

18.

Data Mining of Software Development Databases

Khoshgoftaar Taghi M. Allen Edward B. Jones Wendell D. Hudepohl John P. 《Software Quality Journal》2001,9(3):161-176

Software quality models can predict which modules will have high risk, enabling developers to target enhancement activities to the most problematic modules. However, many find collection of the underlying software product and process metrics a daunting task.Many software development organizations routinely use very large databases for project management, configuration management, and problem reporting which record data on events during development. These large databases can be an unintrusive source of data for software quality modeling. However, multiplied by many releases of a legacy system or a broad product line, the amount of data can overwhelm manual analysis. The field of data mining is developing ways to find valuable bits of information in very large databases. This aptly describes our software quality modeling situation.This paper presents a case study that applied data mining techniques to software quality modeling of a very large legacy telecommunications software system's configuration management and problem reporting databases. The case study illustrates how useful models can be built and applied without interfering with development. 相似文献