期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Linking software testing results with a machine learning approach

Alexandre Rafael Lenz Aurora Pozo Silvia Regina Vergilio 《Engineering Applications of Artificial Intelligence》2013,26(5-6):1631-1640

Software testing techniques and criteria are considered complementary since they can reveal different kinds of faults and test distinct aspects of the program. The functional criteria, such as Category Partition, are difficult to be automated and are usually manually applied. Structural and fault-based criteria generally provide measures to evaluate test sets. The existing supporting tools produce a lot of information including: input and produced output, structural coverage, mutation score, faults revealed, etc. However, such information is not linked to functional aspects of the software. In this work, we present an approach based on machine learning techniques to link test results from the application of different testing techniques. The approach groups test data into similar functional clusters. After this, according to the tester's goals, it generates classifiers (rules) that have different uses, including selection and prioritization of test cases. The paper also presents results from experimental evaluations and illustrates such uses. 相似文献

2.

Cross-document structural relationship identification using supervised machine learning

Yogan Jaya Kumar Naomie Salim Basit Raza 《Applied Soft Computing》2012,12(10):3124-3131

相似文献

3.

Transfer learning for cross-company software defect prediction

Ying Ma Guangchun Luo Aiguo Chen 《Information and Software Technology》2012,54(3):248-256

Context

Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company data. It is difficult to employ these models which are built on the within-company data in practice, because of the lack of these local data repositories. Recently, transfer learning has attracted more and more attention for building classifier in target domain using the data from related source domain. It is very useful in cases when distributions of training and test instances differ, but is it appropriate for cross-company software defect prediction?

Objective

In this paper, we consider the cross-company defect prediction scenario where source and target data are drawn from different companies. In order to harness cross company data, we try to exploit the transfer learning method to build faster and highly effective prediction model.

Method

Unlike the prior works selecting training data which are similar from the test data, we proposed a novel algorithm called Transfer Naive Bayes (TNB), by using the information of all the proper features in training data. Our solution estimates the distribution of the test data, and transfers cross-company data information into the weights of the training data. On these weighted data, the defect prediction model is built.

Results

This article presents a theoretical analysis for the comparative methods, and shows the experiment results on the data sets from different organizations. It indicates that TNB is more accurate in terms of AUC (The area under the receiver operating characteristic curve), within less runtime than the state of the art methods.

Conclusion

It is concluded that when there are too few local training data to train good classifiers, the useful knowledge from different-distribution training data on feature level may help. We are optimistic that our transfer learning method can guide optimal resource allocation strategies, which may reduce software testing cost and increase effectiveness of software testing process. 相似文献

4.

Online learning method using support vector machine for surface-electromyogram recognition

Shuji Kawano Dai Okumura Hiroki Tamura Hisasi Tanaka Koichi Tanno 《Artificial Life and Robotics》2009,13(2):483-487

Research surface electromyogram (s-EMG) signal recognition using neural networks is a method which identifies the relation between s-EMG patterns. However, it is not sufficiently satisfying for the user because s-EMG signals change according to muscle wasting or to changes in the electrode position, etc. A support vector machine (SVM) is one of the most powerful tools for solving classification problems, but it does not have an online learning technique. In this article, we propose an online learning method using SVM with a pairwise coupling technique for s-EMG recognition. We compared its performance with the original SVM and a neural network. Simulation results showed that our proposed method is better than the original SVM. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008 相似文献

5.

A comparative study for content-based dynamic spam classification using four machine learning algorithms 总被引：1，自引：0，他引：1

Bo Yu Zong-ben Xu 《Knowledge》2008,21(4):355-362

The growth of email users has resulted in the dramatic increasing of the spam emails during the past few years. In this paper, four machine learning algorithms, which are Naïve Bayesian (NB), neural network (NN), support vector machine (SVM) and relevance vector machine (RVM), are proposed for spam classification. An empirical evaluation for them on the benchmark spam filtering corpora is presented. The experiments are performed based on different training set size and extracted feature size. Experimental results show that NN classifier is unsuitable for using alone as a spam rejection tool. Generally, the performances of SVM and RVM classifiers are obviously superior to NB classifier. Compared with SVM, RVM is shown to provide the similar classification result with less relevance vectors and much faster testing time. Despite the slower learning procedure, RVM is more suitable than SVM for spam classification in terms of the applications that require low complexity. 相似文献

6.

Systematic literature review of machine learning based software development effort estimation models

Jianfeng Wen Shixian LiZhiyong Lin Yong HuChangqin Huang 《Information and Software Technology》2012,54(1):41-59

Context

Software development effort estimation (SDEE) is the process of predicting the effort required to develop a software system. In order to improve estimation accuracy, many researchers have proposed machine learning (ML) based SDEE models (ML models) since 1990s. However, there has been no attempt to analyze the empirical evidence on ML models in a systematic way.

Objective

This research aims to systematically analyze ML models from four aspects: type of ML technique, estimation accuracy, model comparison, and estimation context.

Method

We performed a systematic literature review of empirical studies on ML model published in the last two decades (1991-2010).

Results

We have identified 84 primary studies relevant to the objective of this research. After investigating these studies, we found that eight types of ML techniques have been employed in SDEE models. Overall speaking, the estimation accuracy of these ML models is close to the acceptable level and is better than that of non-ML models. Furthermore, different ML models have different strengths and weaknesses and thus favor different estimation contexts.

Conclusion

ML models are promising in the field of SDEE. However, the application of ML models in industry is still limited, so that more effort and incentives are needed to facilitate the application of ML models. To this end, based on the findings of this review, we provide recommendations for researchers as well as guidelines for practitioners. 相似文献

7.

支持向量机 总被引：11，自引：0，他引：11

张浩然韩正之李昌刚《计算机科学》2002,29(12):135-137

1 前言基于数据的机器学习是人工智能技术中的重要方面,从观测数据(样本)出发寻找数据中的模式和数据间的函数依赖规律,利用这些模式和函数依赖对未来数据或无法观测的数据进行分类、识别和预测。关于其实现方法大致可以分为三种,第一种是经典的(参数)统计估计方法,在这种方法中,参数的相关形式是已知的,训练样本用来估计参数的值。这种方法有很大的局限性,首先,它需要已知样本分布形式,其次传统统计学研究的是样本数目趋于无穷大时的渐近理论,现有学习方法也多是基于此假设,但在实际问题中,样本数往往是有限的,因此一些理论上很优秀的学习方法实际中表现却可相似文献

8.

An online core vector machine with adaptive MEB adjustment

Di Wang Peng Zhang 《Pattern recognition》2010,43(10):3468-3482

Support vector machine (SVM) is a widely used classification technique. However, it is difficult to use SVMs to deal with very large data sets efficiently. Although decomposed SVMs (DSVMs) and core vector machines (CVMs) have been proposed to overcome this difficulty, they cannot be applied to online classification (or classification with learning ability) because, when new coming samples are misclassified, the classifier has to be adjusted based on the new coming misclassified samples and all the training samples. The purpose of this paper is to address this issue by proposing an online CVM classifier with adaptive minimum-enclosing-ball (MEB) adjustment, called online CVMs (OCVMs). The OCVM algorithm has two features: (1) many training samples are permanently deleted during the training process, which would not influence the final trained classifier; (2) with a limited number of selected samples obtained in the training step, the adjustment of the classifier can be made online based on new coming misclassified samples. Experiments on both synthetic and real-world data have shown the validity and effectiveness of the OCVM algorithm. 相似文献

9.

Predicting defect-prone software modules using support vector machines 总被引：2，自引：0，他引：2

Karim O. Elish Author Vitae Author Vitae 《Journal of Systems and Software》2008,81(5):649-660

Effective prediction of defect-prone software modules can enable software developers to focus quality assurance activities and allocate effort and resources more efficiently. Support vector machines (SVM) have been successfully applied for solving both classification and regression problems in many applications. This paper evaluates the capability of SVM in predicting defect-prone software modules and compares its prediction performance against eight statistical and machine learning models in the context of four NASA datasets. The results indicate that the prediction performance of SVM is generally better than, or at least, is competitive against the compared models. 相似文献

10.

Face recognition based on extreme learning machine 总被引：2，自引：0，他引：2

Weiwei ZongAuthor VitaeGuang-Bin HuangAuthor Vitae 《Neurocomputing》2011,74(16):2541-2551

Extreme learning machine (ELM) is an efficient learning algorithm for generalized single hidden layer feedforward networks (SLFNs), which performs well in both regression and classification applications. It has recently been shown that from the optimization point of view ELM and support vector machine (SVM) are equivalent but ELM has less stringent optimization constraints. Due to the mild optimization constraints ELM can be easy of implementation and usually obtains better generalization performance. In this paper we study the performance of the one-against-all (OAA) and one-against-one (OAO) ELM for classification in multi-label face recognition applications. The performance is verified through four benchmarking face image data sets. 相似文献

11.

Nonlinear regression in environmental sciences using extreme learning machines: A comparative evaluation

《Environmental Modelling & Software》2015

The extreme learning machine (ELM), a single-hidden layer feedforward neural network algorithm, was tested on nine environmental regression problems. The prediction accuracy and computational speed of the ensemble ELM were evaluated against multiple linear regression (MLR) and three nonlinear machine learning (ML) techniques – artificial neural network (ANN), support vector regression and random forest (RF). Simple automated algorithms were used to estimate the parameters (e.g. number of hidden neurons) needed for model training. Scaling the range of the random weights in ELM improved its performance. Excluding large datasets (with large number of cases and predictors), ELM tended to be the fastest among the nonlinear models. For large datasets, RF tended to be the fastest. ANN and ELM had similar skills, but ELM was much faster than ANN except for large datasets. Generally, the tested ML techniques outperformed MLR, but no single method was best for all the nine datasets. 相似文献

12.

Characterizing forest canopy structure with lidar composite metrics and machine learning 总被引：1，自引：0，他引：1

Kaiguang Zhao Sorin Popescu Yong Pang 《Remote sensing of environment》2011,115(8):1978-1996

A lack of reliable observations for canopy science research is being partly overcome by the gradual use of lidar remote sensing. This study aims to improve lidar-based canopy characterization with airborne laser scanners through the combined use of lidar composite metrics and machine learning models. Our so-called composite metrics comprise a relatively large number of lidar predictors that tend to retain as much information as possible when reducing raw lidar point clouds into a format suitable as inputs to predictive models of canopy structural variables. The information-rich property of such composite metrics is further complemented by machine learning, which offers an array of supervised learning models capable of relating canopy characteristics to high-dimensional lidar metrics via complex, potentially nonlinear functional relationships. Using coincident lidar and field data over an Eastern Texas forest in USA, we conducted a case study to demonstrate the ubiquitous power of the lidar composite metrics in predicting multiple forest attributes and also illustrated the use of two kernel machines, namely, support vector machine and Gaussian processes (GP). Results show that the two machine learning models in conjunction with the lidar composite metrics outperformed traditional approaches such as the maximum likelihood classifier and linear regression models. For example, the five-fold cross validation for GP regression models (vs. linear/log-linear models) yielded a root mean squared error of 1.06 (2.36) m for Lorey's height, 0.95 (3.43) m for dominant height, 5.34 (8.51) m²/ha for basal area, 21.4 (40.5) Mg/ha for aboveground biomass, 6.54 (9.88) Mg/ha for belowground biomass, 0.75 (2.76) m for canopy base height, 2.2 (2.76) m for canopy ceiling height, 0.015 (0.02) kg/m³ for canopy bulk density, 0.068 (0.133) kg/m² for available canopy fuel, and 0.33 (0.39) m²/m² for leaf area index. Moreover, uncertainty estimates from the GP regression were more indicative of the true errors in the predicted canopy variables than those from their linear counterparts. With the ever-increasing accessibility of multisource remote sensing data, we envision a concomitant expansion in the use of advanced statistical methods, such as machine learning, to explore the potentially complex relationships between canopy characteristics and remotely-sensed predictors, accompanied by a desideratum for improved error analysis. 相似文献

13.

支撑矢量机的改进算法研究

郝志峰舒蕾林大瀛杨晓伟《计算机工程与应用》2003,39(12):81-83

该文介绍了支撑矢量机(SVM)的有关概念、学习算法。并且详细介绍了一种改进算法LSVM,对于机器学习算法的研究具有启迪作用。相似文献

14.

Approximating support vector machine with artificial neural network for fast prediction

《Expert systems with applications》2014,41(10):4989-4995

Support vector machine (SVM) is a powerful algorithm for classification and regression problems and is widely applied to real-world applications. However, its high computational load in the test phase makes it difficult to use in practice. In this paper, we propose hybrid neural network (HNN), a method to accelerate an SVM in the test phase by approximating the SVM. The proposed method approximates the SVM using an artificial neural network (ANN). The resulting regression function of the ANN replaces the decision function or the regression function of the SVM. Since the prediction of the ANN requires significantly less computation than that of the SVM, the proposed method yields faster test speed. The proposed method is evaluated by experiments on real-world benchmark datasets. Experimental results show that the proposed method successfully accelerates SVM in the test phase with little or no prediction loss. 相似文献

15.

Web-based algorithm for cylindricity evaluation using support vector machine learning

Keun Lee Sohyung Cho Shihab Asfour 《Computers & Industrial Engineering》2011

This paper introduces a cylindricity evaluation algorithm based on support vector machine learning with a specific kernel function, referred to as SVR, as a viable alternative to traditional least square method (LSQ) and non-linear programming algorithm (NLP). Using the theory of support vector machine regression, the proposed algorithm in this paper provides more robust evaluation in terms of CPU time and accuracy than NLP and this is supported by computational experiments. Interestingly, it has been shown that the SVR significantly outperforms LSQ in terms of the accuracy while it can evaluate the cylindricity in a more robust fashion than NLP when the variance of the data points increases. The robust nature of the proposed algorithm is expected because it converts the original nonlinear problem with nonlinear constraints into other nonlinear problem with linear constraints. In addition, the proposed algorithm is programmed using Java Runtime Environment to provide users with a Web based open source environment. In a real-world setting, this would provide manufacturers with an algorithm that can be trusted to give the correct answer rather than making a good part rejected because of inaccurate computational results. 相似文献

16.

Support vector machine active learning for music retrieval 总被引：7，自引：0，他引：7

Michael I. Mandel Graham E. Poliner Daniel P. W. Ellis 《Multimedia Systems》2006,12(1):3-13

相似文献

17.

支持向量机及其应用研究综述 总被引：78，自引：1，他引：78

祁亨年《计算机工程》2004,30(10):6-9

在分析支持向量机原理的基础上,分别从人脸检测、验证和识别、说话人/语音识别、文字/手写体识别、图像处理及其他应用研究等方面对SVM的应用研究进行了综述,并讨论了SVM的优点和不足,展望了其应用研究的前景．相似文献

18.

A systematic review of software fault prediction studies

Cagatay Catal Banu Diri 《Expert systems with applications》2009,36(4):7346-7354

This paper provides a systematic review of previous software fault prediction studies with a specific focus on metrics, methods, and datasets. The review uses 74 software fault prediction papers in 11 journals and several conference proceedings. According to the review results, the usage percentage of public datasets increased significantly and the usage percentage of machine learning algorithms increased slightly since 2005. In addition, method-level metrics are still the most dominant metrics in fault prediction research area and machine learning algorithms are still the most popular methods for fault prediction. Researchers working on software fault prediction area should continue to use public datasets and machine learning algorithms to build better fault predictors. The usage percentage of class-level is beyond acceptable levels and they should be used much more than they are now in order to predict the faults earlier in design phase of software life cycle. 相似文献

19.

一个改进的支撑向量机训练算法

王国胜钟义信《计算机工程与应用》2002,38(21):4-6

支撑向量机是90年代中期发展起来的机器学习技术,NPA算法是目前最优秀的学习算法之一。该文在文献犤3犦,犤8犦的基础上,通过大量实验和深入分析,发现该算法尚存在一些不足之处,进而提出一个改进的NPA算法。实验表明,新算法简单易行,性能稳定,在不增加复杂度的情况下,学习速度比NPA算法显著提高。相似文献

20.

支持向量机算法和软件ChemSVM介绍 总被引：53，自引：27，他引：26

陆文聪陈念贻叶晨洲李国正《计算机与应用化学》2002,19(6):697-702

Vladimir N.Vapnik等提出的统计学习理论（statistical learning theory,简称SLT）和支持向量机（support vector machine,简称SVM）算法已取得令人鼓舞的研究成果。本文旨在对这一新理论和新算法的原理作一介绍,并展望这一计算机学界的新成果在化学化工领域的应用前景,“ChemSVM”软件提供了通用的支持向量机算法,并将其与数据库,知识库,原子参数及其他数据挖掘方法有机地集成起来。相似文献