期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction

Goyal Somya 《Artificial Intelligence Review》2022,55(3):2023-2064

Software Defect Prediction (SDP) is highly crucial task in software development process to forecast about which modules are more prone to errors and faults before the instigation of the testing phase. It aims to reduce the development cost of the software by focusing the testing efforts to those predicted faulty modules. Though, it ensures in-time delivery of good quality end-product, but class-imbalance of dataset is a major hinderance to SDP. This paper proposes a novel Neighbourhood based Under-Sampling (N-US) algorithm to handle class imbalance issue. This work is dedicated to demonstrating the effectiveness of proposed Neighbourhood based Under-Sampling (N-US) approach to attain high accuracy while predicting the defective modules. The algorithm N-US under samples the dataset to maximize the visibility of minority data points while restricting the excessive elimination of majority data points to avoid information loss. To assess the applicability of N-US, it is compared with three standard under-sampling techniques. Further, this study investigates the performance of N-US as a trusted ally for SDP classifiers. Extensive experiments are conducted using benchmark datasets from NASA repository which are CM1, JM1, KC1, KC2 and PC1. The proposed SDP classifier with N-US technique is compared with baseline models statistically to assess the effectiveness of N-US algorithm for SDP. The proposed model outperforms the rest of the candidate SDP models with the highest AUC score (=?95.6%), the maximum Accuracy value (=?96.9%) and the closest ROC curve to the top left corner. It shows up with the best prediction power statistically with confidence level of 95%.

相似文献

2.

基于改进深度森林算法的软件缺陷预测

薛参观燕雪峰《计算机科学》2018,45(8):160-165

软件缺陷预测是合理利用软件测试资源、提高软件性能的重要途径。为处理软件缺陷预测模型中浅层机器学习算法无法对软件数据特征进行深度挖掘的问题,提出一种改进深度森林算法——深度堆叠森林(DSF)。该算法首先采用随机抽样的方式对软件的原始特征进行变换以增强其特征表达能力,然后用堆叠结构对变换特征做逐层表征学习。将深度堆叠森林应用于Eclipse数据集的缺陷预测中,实验结果表明,该算法在预测性能和时间效率上均比深度森林有明显的提升。相似文献

3.

基于代理辅助多目标萤火虫算法的软件缺陷预测方法研究

曹良林贲可荣张献《计算机工程与科学》2022,44(2):257-265

针对软件缺陷预测中数据维度的复杂化和类不平衡问题,提出一种基于代理辅助模型的多目标萤火虫算法(SMO-MSFFA)的软件缺陷预测方法。该方法采用了多组策略萤火虫算法(MSFFA),以最小化数据的特征选择比率和最大化模型评测AUC值为多目标目标函数,分别以随机森林(RF)、支持向量机(SVM)和K近邻分类算法(KNN)为分类器构建软件缺陷预测模型。考虑到进化算法自身的迭代特点,嵌入代理模型离线完成部分个体评价函数的计算,以缩短计算耗时。在公开数据集NASA中的PC1、KC1和MC1项目上进行实验验证,与NSGA-II方法相比,在项目PC1、KC1和MC1上模型AUC均值分别提升0.17、降低0.01和提升0.09,平均特征选择比率分别降低0.08,0.17和0.05,平均耗时分别增加131 s,降低了199 s和降低了431 s。实验结果表明,提出的方法在提高模型性能、降低特征选择比率和缩短计算耗时方面具有明显的优势。相似文献

4.

Software fault prediction using particle swarm algorithm with genetic algorithm and support vector machine classifier

Hiba Alsghaier Mohammed Akour 《Software》2020,50(4):407-427

Software fault prediction is a process of developing modules that are used by developers in order to help them to detect faulty classes or faulty modules in early phases of the development life cycle and to determine the modules that need more refactoring in the maintenance phase. Software reliability means the probability of failure has occurred during a period of time, so when we describe a system as not reliable, it means that it contains many errors, and these errors can be accepted in some systems, but it may lead to crucial problems in critical systems like aircraft, space shuttle, and medical systems. Therefore, locating faulty software modules is an essential step because it helps defining the modules that need more refactoring or more testing. In this article, an approach is developed by integrating genetics algorithm (GA) with support vector machine (SVM) classifier and particle swarm algorithm for software fault prediction as a stand though for better software fault prediction technique. The developed approach is applied into 24 datasets (12-NASA MDP and 12-Java open-source projects), where NASA MDP is considered as a large-scale dataset and Java open-source projects are considered as a small-scale dataset. Results indicate that integrating GA with SVM and particle swarm algorithm improves the performance of the software fault prediction process when it is applied into large-scale and small-scale datasets and overcome the limitations in the previous studies. 相似文献

5.

一种半监督集成跨项目软件缺陷预测方法

何吉元孟昭鹏陈翔王赞樊向宇《软件学报》2017,28(6):1455-1473

软件缺陷预测方法可以在项目的开发初期,通过预先识别出所有可能含有缺陷的软件模块来优化测试资源的分配。早期的缺陷预测研究大多集中于同项目缺陷预测,但同项目缺陷预测需要充足的历史数据,而在实际应用中可能需要预测的项目的历史数据较为稀缺,或这个项目是一个全新项目。因此跨项目缺陷预测问题成为当前软件缺陷预测领域内的一个研究热点,其研究挑战在于源项目与目标项目数据集间存在的分布差异性以及数据集内存在的类不平衡问题。受到基于搜索的软件工程思想的启发,论文提出了一种基于搜索的半监督集成跨项目软件缺陷预测方法S³EL。该方法首先通过调整训练集中各类数据的分布比例,构建出多个朴素贝叶斯基分类器,随后利用具有全局搜索能力的遗传算法,基于少量已标记目标实例对上述基分类器进行集成,并构建出最终的缺陷预测模型。在Promise数据集及AEEEM数据集上和多个经典的跨项目缺陷预测方法（Burak过滤法、Peters过滤法、TCA+、CODEP及HYDRA）进行了对比。以F1值作为评测指标,结果表明在大部分情况下,S³EL方法可以取得最好的预测性能。相似文献

6.

基于Petri网的以活动为中心的软件过程建模方法

尹琴王小平《计算机应用与软件》2008,25(1):187-189,255

软件过程是软件生命周期中所实施的一系列活动的集合.软件过程模型为软件开发者提供了软件开发的标准,也方便了开发者之间的交流.过程建模作为软件过程中最主要的活动,是对实际软件过程的再加工工程.基于Petri网提出了一种以活动为中心的软件过程控制模型ACCM,并给出了相应的算法.最后以一个软件开发的实例解释说明了该方法的有效性. 相似文献

7.

A novel modified undersampling (MUS) technique for software defect prediction

P. Lingden Abeer Alsadoon P.W.C. Prasad Omar Hisham Alsadoon Rasha S. Ali Vinh Tran Quoc Nguyen 《Computational Intelligence》2019,35(4):1003-1020

Background and aim: Many sophisticated data mining and machine learning algorithms have been used for software defect prediction (SDP) to enhance the quality of software. However, real‐world SDP data sets suffer from class imbalance, which leads to a biased classifier and reduces the performance of existing classification algorithms resulting in an inaccurate classification and prediction. This work aims to improve the class imbalance nature of data sets to increase the accuracy of defect prediction and decrease the processing time . Methodology: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time. It consists of a modified undersampling method and a correlation feature selection (CFS) method. Results: The results from ten open source project data sets showed that the proposed model improves the accuracy in terms of F1‐score to 0.52 ～ 0.96, and hence it is proximity reached best F1‐score value in 0.96 near to 1 then it is given a perfect performance in the prediction process. Conclusion: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time using the proposed model. 相似文献

8.

Deep Metallogenic prediction model construction of the Xiongcun no. II orebody based on the DNN algorithm

Zhang Di Zhou Zhongli Han Suyue Gong Hao Zou Tianyi Luo Jie 《Multimedia Tools and Applications》2022,81(23):33185-33203

With the continuous mining and gradual reduction of shallow deposits, deep prospecting has become a new global prospecting trend. In addition, with the development of artificial intelligence, deep learning provides a favorable means for geological big data analysis. This paper, researches the No. II Orebody of the Xiongcun deposit. First, based on previous research results and metallogenic regularity, prospecting information, namely, lithology, Au-Ag-Cu chemical elements and wall rock alteration is extracted, and the block model is established by combining the Kriging interpolation structure. Second, the datasets are divided into dataset I and dataset II according to “randomness” and “depth”. Third, deep prospecting prediction models based on deep neural networks (DNN) and the convolutional neural networks (CNN) is constructed, and the model parameters are optimized. Finally, the models are applied to the deep prediction of the Xiongcun No. II Orebody. The results show that the accuracy rate and recall rate of the prediction model based on the DNN algorithm are 96.15% and 89.23%, respectively, and the AUC is 96.39%, which are higher values than those of the CNN algorithm, indicating that the performance of the prediction model based on the DNN algorithm is better. The accuracy of prediction model based on dataset I is higher than that of dataset II. The accuracy of deep metallogenic prediction based on the DNN algorithm is approximately 89%, that based on the CNN is approximately 87%, and that based on prospecting information method is approximately 61.27%. The prediction results of the DNN algorithm are relatively consistent in the spatial location and scale of the orebody. Therefore, based on the work done in this paper, it is feasible to use a deep learning method to carry out deep mineral prediction.

相似文献

9.

软件缺陷预测技术研究进展

宫丽娜姜淑娟姜丽《软件学报》2019,30(10):3090-3114

随着软件规模的扩大和复杂度的不断提高,软件的质量问题成为关注的焦点,软件缺陷是软件质量的对立面,威胁着软件质量,如何在软件开发的早期挖掘出缺陷模块成为一个亟需解决的问题.软件缺陷预测通过挖掘软件历史仓库,设计出与缺陷相关的内在度量元,然后借助机器学习等方法来提前发现与锁定缺陷模块,从而合理地分配有限的资源.因此,软件缺陷预测是软件质量保证的重要途径之一,近年来已成为软件工程中一个非常重要的研究课题.汇总近8年（2010年~2017年）国内外的缺陷预测技术的研究成果,并以缺陷预测的形式为主线进行分析,首先介绍了软件缺陷预测模型的框架;然后从软件缺陷数据集、构建模型的方法及评价指标这3个方面对已有的研究工作进行分类归纳和比较;最后探讨了软件缺陷预测的未来可能的研究方向、机遇和挑战. 相似文献

10.

基于TextRank的软件变更任务搜索术语识别

王杉《计算机系统应用》2020,29(10):262-266

在软件工程的演进或维护阶段,有很多软件变更要求需要软件开发人员处理,这些变更要求通常都使用自然语言文本进行编制,而且通常涉及一个或多个相关问题域.软件开发人员要将这些概念准确映射到软件项目中的相应源码位置,已进行所要求的变更.完成这样的映射需建立若干搜索术语项,并在项目中进行搜索.而研究表明,开发人员在为任务变更提出准确而合适的搜索条件时具有一些困难.因此本文提出了一种基于TextRank的软件变更任务搜索术语的识别方法,通过分析自然语言描述的任务来识别和提出软件变更的搜索术语项,以提高搜索的准确性、平均精度和召回率. 相似文献

11.

结合多元度量指标软件缺陷预测研究进展

下载免费PDF全文

杨丰玉黄雅璇周世健郑巍《计算机工程与应用》2021,57(5):10-24

软件缺陷预测可帮助开发人员提前预测缺陷程序,合理分配有限的测试资源。软件缺陷预测的准确度不仅依赖于预测方法的选择,更依赖于软件的度量指标。因此,结合多元度量指标进行软件缺陷预测已成为当前的研究热点。从度量指标出发,对传统度量指标、多元度量指标以及结合多元度量指标的缺陷预测的研究进展进行了系统介绍。主要工作包含：介绍了传统的代码和过程度量指标、基于传统度量指标的软件缺陷预测模型以及影响数据质量的因素;阐述了语义结构度量指标;分析列举了当前用于软件缺陷预测的评价指标;结合预测粒度、传统度量指标、语义结构度量指标、跨项目软件缺陷预测对多元度量指标软件缺陷预测未来的研究趋势进行了展望。相似文献

12.

DP-Share: Privacy-Preserving Software Defect Prediction Model Sharing Through Differential Privacy

下载免费PDF全文

Xiang Chen Dun Zhang Zhan-Qi Cui Qing Gu Xiao-Lin Ju 《计算机科学技术学报》2019,34(5):1020-1038

In current software defect prediction (SDP) research, most previous empirical studies only use datasets provided by PROMISE repository and this may cause a threat to the external validity of previous empirical results. Instead of SDP dataset sharing, SDP model sharing is a potential solution to alleviate this problem and can encourage researchers in the research community and practitioners in the industrial community to share more models. However, directly sharing models may result in privacy disclosure, such as model inversion attack. To the best of our knowledge, we are the first to apply differential privacy (DP) to privacy-preserving SDP model sharing and then propose a novel method DP-Share, since DP mechanisms can prevent this attack when the privacy budget is carefully selected. In particular, DP-Share first performs data preprocessing for the dataset, such as over-sampling for minority instances (i.e., defective modules) and conducting discretization for continuous features to optimize privacy budget allocation. Then, it uses a novel sampling strategy to create a set of training sets. Finally it constructs decision trees based on these training sets and these decision trees can form a random forest (i.e., model). The last phase of DP-Share uses Laplace and exponential mechanisms to satisfy the requirements of DP. In our empirical studies, we choose nine experimental subjects from real software projects. Then, we use AUC (area under ROC curve) as the performance measure and holdout as our model validation technique. After privacy and utility analysis, we find that DP-Share can achieve better performance than a baseline method DF-Enhance in most cases when using the same privacy budget. Moreover, we also provide guidelines to effectively use our proposed method. Our work attempts to fill the research gap in terms of differential privacy for SDP, which can encourage researchers and practitioners to share more SDP models and then effectively advance the state of the art of SDP. 相似文献

13.

基于深度学习的蛋白质亚细胞定位预测

王艺皓丁洪伟李波保利勇张颖婕《计算机应用》2005,40(11):3393-3399

针对传统机器学习算法中仍需手工操作表示特征的问题，提出了一种基于堆栈式降噪自编码器（SDAE）深度网络的蛋白质亚细胞定位算法。首先，分别利用改进型伪氨基酸组成法（PseAAC）、伪位置特异性得分矩阵法（PsePSSM）和三联体编码法（CT）对蛋白质序列进行特征提取，并将这三种方法得到的特征向量进行融合，以得到一个全新的蛋白质序列特征表达模型；接着，将融合后的特征向量输入到SDAE深度网络里自动学习更有效的特征表示；然后选用Softmax回归分类器进行亚细胞的分类预测，并采用留一法在Viral proteins和Plant proteins两个数据集上进行交叉验证；最后，将所提算法的结果与mGOASVM、HybridGO-Loc等多种现有算法的结果进行比较。实验结果表明，所提算法在Viral proteins数据集上取得了98.24%的准确率，与mGOASVM算法相比提高了9.35个百分点；同时所提算法在Plant proteins数据集上取得了97.63%的准确率，比mGOASVM算法和HybridGO-Loc算法分别提高了10.21个百分点和4.07个百分点。综上说明所提算法可以有效提高蛋白质亚细胞定位预测的准确性。相似文献

14.

An empirical study of factors affecting cross-project aging-related bug prediction with TLAP

Qin Fangyun Wan Xiaohui Yin Beibei 《Software Quality Journal》2020,28(1):107-134

Software aging is a phenomenon in which long-running software systems show an increasing failure rate and/or progressive performance degradation. Due to their nature, Aging-Related Bugs (ARBs) are hard to discover during software testing and are also challenging to reproduce. Therefore, automatically predicting ARBs before software release can help developers reduce ARB impact or avoid ARBs. Many bug prediction approaches have been proposed, and most of them show effectiveness in within-project prediction settings. However, due to the low presence and reproducing difficulty of ARBs, it is usually hard to collect sufficient training data to build an accurate prediction model. A recent work proposed a method named Transfer Learning based Aging-related bug Prediction (TLAP) for performing cross-project ARB prediction. Although this method considerably improves cross-project ARB prediction performance, it has been observed that its prediction result is affected by several key factors, such as the normalization methods, kernel functions, and machine learning classifiers. Therefore, this paper presents the first empirical study to examine the impact of these factors on the effectiveness of cross-project ARB prediction in terms of single-factor pattern, bigram pattern, and triplet pattern and validates the results with the Scott-Knott test technique. We find that kernel functions and classifiers are key factors affecting the effectiveness of cross-project ARB prediction, while normalization methods do not show statistical influence. In addition, the order of values in three single-factor patterns is maintained in three bigram patterns and one triplet pattern to a large extent. Similarly, the order of values in the three bigram patterns is also maintained in the triplet pattern.

相似文献

15.

基于深度学习的蛋白质亚细胞定位预测

王艺皓丁洪伟李波保利勇张颖婕《计算机应用》2020,40(11):3393-3399

针对传统机器学习算法中仍需手工操作表示特征的问题,提出了一种基于堆栈式降噪自编码器（SDAE）深度网络的蛋白质亚细胞定位算法。首先,分别利用改进型伪氨基酸组成法（PseAAC）、伪位置特异性得分矩阵法（PsePSSM）和三联体编码法（CT）对蛋白质序列进行特征提取,并将这三种方法得到的特征向量进行融合,以得到一个全新的蛋白质序列特征表达模型;接着,将融合后的特征向量输入到SDAE深度网络里自动学习更有效的特征表示;然后选用Softmax回归分类器进行亚细胞的分类预测,并采用留一法在Viral proteins和Plant proteins两个数据集上进行交叉验证;最后,将所提算法的结果与mGOASVM、HybridGO-Loc等多种现有算法的结果进行比较。实验结果表明,所提算法在Viral proteins数据集上取得了98.24%的准确率,与mGOASVM算法相比提高了9.35个百分点;同时所提算法在Plant proteins数据集上取得了97.63%的准确率,比mGOASVM算法和HybridGO-Loc算法分别提高了10.21个百分点和4.07个百分点。综上说明所提算法可以有效提高蛋白质亚细胞定位预测的准确性。相似文献

16.

基于深度学习的椎间孔狭窄自动多分级研究

下载免费PDF全文

洪雁飞魏本征刘川韩忠义李天阳《智能系统学报》2019,14(4):708-715

椎间孔狭窄症的术前定性分级诊断对临床医生治疗策略的制定和患者健康恢复至关重要,但目前该方面临床上仍然存在很多问题,并且缺乏相关的研究和行之有效的方法用于辅助临床医生诊断。因此,为提高计算机辅助椎间孔狭窄症诊断准确率以及医生工作效率,本文提出一种基于深度学习的椎间孔狭窄图像自动分级算法。从人体矢状切脊柱核磁共振图像中提取脊柱椎间孔图像,并做图像预处理;设计一种监督式深度卷积神经网络模型,用于实现脊柱椎间孔图像数据集的自动多分级;利用迁移学习方法,解决深度学习算法在小样本数据集上的过拟合问题。实验结果表明,本文算法在脊柱椎间孔图像数据集上的分类精确度可达到87.5%以上,且其具有良好的鲁棒性和泛化能力。相似文献

17.

A new fuzzy rule based algorithm for estimating software faults in early phase of development

Subhashis Chatterjee Bappa Maji 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2016,20(10):4023-4035

Estimation of reliability and the number of faults present in software in its early development phase, i.e., requirement analysis or design phase is very beneficial for developing reliable software with optimal cost. Software reliability prediction in early phase of development is highly desirable to the stake holders, software developers, managers and end users. Since, the failure data are unavailable in early phase of software development, different reliability relevant software metrics and similar project data are used to develop models for early software fault prediction. The proposed model uses the linguistic values of software metrics in fuzzy inference system to predict the total number of faults present in software in its requirement analysis phase. Considering specific target reliability, weightage of each input software metrics and size of software, an algorithm has been proposed here for developing general fuzzy rule base. For model validation of the proposed model, 20 real software project data have been used here. The linguistic values from four software metrics related to requirement analysis phase have been considered as model inputs. The performance of the proposed model has been compared with two existing early software fault prediction models. 相似文献

18.

用于交通图像识别的改进尺度依赖池化模型

徐喆冯长华《计算机应用》2018,38(3):671-676

针对交通标志在自然场景中所占的比例较小、提取的特征量不足、识别准确率低的问题,提出改进的尺度依赖池化（SDP）模型用于小尺度交通图像的识别。首先,基于神经网络深卷积层具有较好的轮廓信息与类别特征,在SDP模型只提取浅卷积层特征信息的基础上,使用深卷积层特征补足型SDP（SD-SDP）映射输出,丰富特征信息;其次,因SDP算法中的单层空间金字塔池化损失边缘信息,使用多尺度滑窗池化（MSP）将特征池化到固定维度,增强小目标的边缘信息;最后,将改进的尺度依赖池化模型应用于交通标志的识别。实验结果表明,与原SDP算法比较,提取特征量增加,小尺度交通图像的识别准确率较好地提升。相似文献

19.

Classification with reject option for software defect prediction

《Applied Soft Computing》2016

ContextSoftware defect prediction (SDP) is an important task in software engineering. Along with estimating the number of defects remaining in software systems and discovering defect associations, classifying the defect-proneness of software modules plays an important role in software defect prediction. Several machine-learning methods have been applied to handle the defect-proneness of software modules as a classification problem. This type of “yes” or “no” decision is an important drawback in the decision-making process and if not precise may lead to misclassifications. To the best of our knowledge, existing approaches rely on fully automated module classification and do not provide a way to incorporate extra knowledge during the classification process. This knowledge can be helpful in avoiding misclassifications in cases where system modules cannot be classified in a reliable way.ObjectiveWe seek to develop a SDP method that (i) incorporates a reject option in the classifier to improve the reliability in the decision-making process; and (ii) makes it possible postpone the final decision related to rejected modules for an expert analysis or even for another classifier using extra domain knowledge.MethodWe develop a SDP method called rejoELM and its variant, IrejoELM. Both methods were built upon the weighted extreme learning machine (ELM) with reject option that makes it possible postpone the final decision of non-classified modules, the rejected ones, to another moment. While rejoELM aims to maximize the accuracy for a rejection rate, IrejoELM maximizes the F-measure. Hence, IrejoELM becomes an alternative for classification with reject option for imbalanced datasets.ResultsrejoEM and IrejoELM are tested on five datasets of source code metrics extracted from real world open-source software projects. Results indicate that rejoELM has an accuracy for several rejection rates that is comparable to some state-of-the-art classifiers with reject option. Although IrejoELM shows lower accuracies for several rejection rates, it clearly outperforms all other methods when the F-measure is used as a performance metric.ConclusionIt is concluded that rejoELM is a valid alternative for classification with reject option problems when classes are nearly equally represented. On the other hand, IrejoELM is shown to be the best alternative for classification with reject option on imbalanced datasets. Since SDP problems are usually characterized as imbalanced learning problems, the use of IrejoELM is recommended. 相似文献

20.

Software development cost estimation using wavelet neural networks 总被引：1，自引：0，他引：1

K. Vinay Kumar Author Vitae Author Vitae Mahil Carr Author Vitae Author Vitae 《Journal of Systems and Software》2008,81(11):1853-1867

Software development has become an essential investment for many organizations. Software engineering practitioners have become more and more concerned about accurately predicting the cost and quality of software product under development. Accurate estimates are desired but no model has proved to be successful at effectively and consistently predicting software development cost. In this paper, we propose the use of wavelet neural network (WNN) to forecast the software development effort. We used two types of WNN with Morlet function and Gaussian function as transfer function and also proposed threshold acceptance training algorithm for wavelet neural network (TAWNN). The effectiveness of the WNN variants is compared with other techniques such as multilayer perceptron (MLP), radial basis function network (RBFN), multiple linear regression (MLR), dynamic evolving neuro-fuzzy inference system (DENFIS) and support vector machine (SVM) in terms of the error measure which is mean magnitude relative error (MMRE) obtained on Canadian financial (CF) dataset and IBM data processing services (IBMDPS) dataset. Based on the experiments conducted, it is observed that the WNN-Morlet for CF dataset and WNN-Gaussian for IBMDPS outperformed all the other techniques. Also, TAWNN outperformed all other techniques except WNN. 相似文献