首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An ensemble is a group of learners that work together as a committee to solve a problem. The existing ensemble learning algorithms often generate unnecessarily large ensembles, which consume extra computational resource and may degrade the generalization performance. Ensemble pruning algorithms aim to find a good subset of ensemble members to constitute a small ensemble, which saves the computational resource and performs as well as, or better than, the unpruned ensemble. This paper introduces a probabilistic ensemble pruning algorithm by choosing a set of “sparse” combination weights, most of which are zeros, to prune the ensemble. In order to obtain the set of sparse combination weights and satisfy the nonnegative constraint of the combination weights, a left-truncated, nonnegative, Gaussian prior is adopted over every combination weight. Expectation propagation (EP) algorithm is employed to approximate the posterior estimation of the weight vector. The leave-one-out (LOO) error can be obtained as a by-product in the training of EP without extra computation and is a good indication for the generalization error. Therefore, the LOO error is used together with the Bayesian evidence for model selection in this algorithm. An empirical study on several regression and classification benchmark data sets shows that our algorithm utilizes far less component learners but performs as well as, or better than, the unpruned ensemble. Our results are very competitive compared with other ensemble pruning algorithms.  相似文献   

2.
As an effective approach for multi-input multi-output regression estimation problems, a multi-dimensional support vector regression (SVR), named M-SVR, is generally capable of obtaining better predictions than applying a conventional support vector machine (SVM) independently for each output dimension. However, although there are many generalization error bounds for conventional SVMs, all of them cannot be directly applied to M-SVR. In this paper, a new leave-one-out (LOO) error estimate for M-SVR is derived firstly through a virtual LOO cross-validation procedure. This LOO error estimate can be straightway calculated once a training process ended with less computational complexity than traditional LOO method. Based on this LOO estimate, a new model selection methods for M-SVR based on multi-objective optimization strategy is further proposed in this paper. Experiments on toy noisy function regression and practical engineering data set, that is, dynamic load identification on cylinder vibration system, are both conducted, demonstrating comparable results of the proposed method in terms of generalization performance and computational cost.  相似文献   

3.
As a novel learning algorithm for single-hidden-layer feedforward neural networks, extreme learning machines (ELMs) have been a promising tool for regression and classification applications. However, it is not trivial for ELMs to find the proper number of hidden neurons due to the nonoptimal input weights and hidden biases. In this paper, a new model selection method of ELM based on multi-objective optimization is proposed to obtain compact networks with good generalization ability. First, a new leave-one-out (LOO) error bound of ELM is derived, and it can be calculated with negligible computational cost once the ELM training is finished. Furthermore, the hidden nodes are added to the network one-by-one, and at each step, a multi-objective optimization algorithm is used to select optimal input weights by minimizing this LOO bound and the norm of output weight simultaneously in order to avoid over-fitting. Experiments on five UCI regression data sets are conducted, demonstrating that the proposed algorithm can generally obtain better generalization performance with more compact network than the conventional gradient-based back-propagation method, original ELM and evolutionary ELM.  相似文献   

4.
Model structure selection is of crucial importance in radial basis function (RBF) neural networks. Existing model structure selection algorithms are essentially forward selection or backward elimination methods that may lead to sub-optimal models. This paper proposes an alternative selection procedure based on the kernelized least angle regression (LARS)–least absolute shrinkage and selection operator (LASSO) method. By formulating the RBF neural network as a linear-in-the-parameters model, we derive a l 1-constrained objective function for training the network. The proposed algorithm makes it possible to dynamically drop a previously selected regressor term that is insignificant. Furthermore, inspired by the idea of LARS, the computing of output weights in our algorithm is greatly simplified. Since our proposed algorithm can simultaneously conduct model structure selection and parameter optimization, a network with better generalization performance is built. Computational experiments with artificial and real world data confirm the efficacy of the proposed algorithm.  相似文献   

5.
Temporal generalization allows a trained classification algorithm to be applied to multiple images across time to derive reliable classification map products. It is a challenging remote-sensing research topic since the results are dependent on the selection of atmospheric correction methods, classification algorithms, validation processes, and their varying combinations. This study examined the temporal generalization of sub-pixel vegetation mapping using multiple Landsat images (1990, 1996, 2004, and 2010). All Landsat images were processed with two atmospheric correction methods: simple dark object subtraction (DOS) and the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) algorithm. For the sub-pixel vegetation mapping of the 2004 Landsat image, we used high-resolution OrbView-3 images as a training/validation data set and compared three machine learning algorithms (neural networks, random forests, and classification and regression trees) for their classification performance. The trained classifiers were then applied to other Landsat images (1990, 1996, and 2010) to derive sub-pixel vegetation map products. For the 2004 Landsat image classification, cross-validation shows similar classification results for neural networks (root mean square error (RMSE) = 0.099) and random forests (RMSE = 0.100) algorithms, and both are better than classification and regression trees (RMSE = 0.123). Pseudo-invariant pixels between 2004 and 2010 were used as validation points to evaluate the temporal generalizability of classification algorithms. Simple DOS and LEDAPS atmospheric correction resulted in similar accuracy statistics. The neural-network-based classifier performed best in generating reliable sub-pixel vegetation map products across time.  相似文献   

6.
In this paper, we present a new methodology for learning parameters of multiple criteria classification method PROAFTN from data. There are numerous representations and techniques available for data mining, for example decision trees, rule bases, artificial neural networks, density estimation, regression and clustering. The PROAFTN method constitutes another approach for data mining. It belongs to the class of supervised learning algorithms and assigns membership degree of the alternatives to the classes. The PROAFTN method requires the elicitation of its parameters for the purpose of classification. Therefore, we need an automatic method that helps us to establish these parameters from the given data with minimum classification errors. Here, we propose variable neighborhood search metaheuristic for getting these parameters. The performances of the newly proposed method were evaluated using 10 cross validation technique. The results are compared with those obtained by other classification methods previously reported on the same data. It appears that the solutions of substantially better quality are obtained with proposed method than with these former ones.  相似文献   

7.
Gene selection procedure is a necessary step to increase the accuracy of machine learning algorithms that help in disease diagnosis based on gene expression data. This is commonly known as a feature subset selection problem in machine learning domain. A fast leave-one-out (LOO) evaluation formula for least-squares support vector machines (LS-SVMs) is introduced here that can guide our backward feature selection process. Based on that, we propose a fast LOO guided feature selection (LGFS) algorithm. The gene selection step size is dynamically adjusted according to the LOO accuracy estimation. For our experiments, the application of LGFS to the gene selection process improves the classifier accuracy and reduces the number of features required as well. The least number of genes that can maximize the disease classification accuracy is automatically determined by our algorithm.  相似文献   

8.
Abstract

A procedure to estimate wheat (Triticum aestivum L) area using a sampling technique based on aerial photographs and digital LANDSAT MSS data was developed. Aerial photographs covering 720km2 were visually analysed. Computer classification of LANDSAT MSS data acquired on 4 September 1979 was performed using unsupervised and supervised algorithms and the classification results were spatially filtered using a post-processing technique. To estimate wheal area, a regression approach was applied using different sample sizes and various sampling units. Based on four decision criteria proposed in this study, it was concluded that (i) as the size of the sampling unit decreased, the percentage of the sample area required to obtain a similar estimation performance also decreased, (ii) the lowest percentage of the area sampled for wheat estimation under established precision and accuracy criteria through regression estimation was 13-09 per cent using 10 km2 as the sampling unit and (iii) wheat-area estimation obtained by regression estimation was more precise and accurate than those obtained by a direct expansion method.  相似文献   

9.
徐雪松  舒俭 《计算机应用》2014,34(8):2285-2290
针对传统多模型数据集回归分析方法计算时间长、模型识别准确率低的问题,提出了一种新的启发式鲁棒回归分析方法。该方法模拟免疫系统聚类学习的原理,采用B细胞网络作为数据集的分类和存储工具,通过判断数据对模型的符合度进行分类,提高了数据分类的准确性,将模型集抽取过程分解成“聚类”“回归”“再聚类”的反复尝试过程,利用并行启发式搜索逼近模型集的解。仿真结果表明,所提方法回归分析时间明显少于传统算法,模型识别准确率明显高于传统算法。根据8模型数据集分析结果,传统算法中,效果最好的是基于RANSAC的逐次提取算法,其平均模型识别准确率为90.37%,需53.3947s;计算时间小于0.5s的传统算法,其准确率不足1%;所提算法仅需0.5094s,其准确率达到了98.25%。  相似文献   

10.
线性模型中变量和变换的同时选择   总被引:2,自引:0,他引:2  
变量选择和变换选择是线性模型中的两个不同的问题.把这两个过程结合起来同时进行,将是很有意义的.由于近来在计算技术方面的发展,这种同时进行的过程现在是可行的.本文提出了在线性模型中的变数和变换同时进行选择的两个方法.节(?)个方法是(?)个纯粹的同时选择过程.第二个方法适用于具有较多预报因子的数据集,也提出了(?)个对于同时选择的向后删除过程.这两个方法皆以贝叶斯模型选择准则为基础.用(?)个实例来说明这(?)方法.  相似文献   

11.
The paper introduces an efficient construction algorithm for obtaining sparse linear-in-the-weights regression models based on an approach of directly optimizing model generalization capability. This is achieved by utilizing the delete-1 cross validation concept and the associated leave-one-out test error also known as the predicted residual sums of squares (PRESS) statistic, without resorting to any other validation data set for model evaluation in the model construction process. Computational efficiency is ensured using an orthogonal forward regression, but the algorithm incrementally minimizes the PRESS statistic instead of the usual sum of the squared training errors. A local regularization method can naturally be incorporated into the model selection procedure to further enforce model sparsity. The proposed algorithm is fully automatic, and the user is not required to specify any criterion to terminate the model construction procedure. Comparisons with some of the existing state-of-art modeling methods are given, and several examples are included to demonstrate the ability of the proposed algorithm to effectively construct sparse models that generalize well.  相似文献   

12.
支持向量机最优模型选择的研究   总被引:18,自引:0,他引:18  
通过对核矩阵的研究,利用核矩阵的对称正定性,采用核校准的方法提出了一种SVM最优模型选择的算法——OMSA算法.利用训练样本不通过SVM标准训练和测试过程而寻求最优的核参数和相应的最优学习模型,弥补了传统SVM在模型选择上经验性强和计算量大的不足.采用该算法在UCI标准数据集和FERET标准人脸库上进行了实验,结果表明,通过该算法找到的核参数以及相应的核矩阵是最优的,得到的SVM分类器的错误率最小.该算法为SVM最优模型选择提供了一种可行的方法,同时对其他基于核的学习方法也具有一定的参考价值.  相似文献   

13.
Variable selection serves a dual purpose in statistical classification problems: it enables one to identify the input variables which separate the groups well, and a classification rule based on these variables frequently has a lower error rate than the rule based on all the input variables. Kernel Fisher discriminant analysis (KFDA) is a recently proposed powerful classification procedure, frequently applied in cases characterised by large numbers of input variables. The important problem of eliminating redundant input variables before implementing KFDA is addressed in this paper. A backward elimination approach is recommended, and two criteria which can be used for recursive elimination of input variables are proposed and investigated. Their performance is evaluated on several data sets and in a simulation study.  相似文献   

14.
15.
This article explores a non-linear partial least square (NLPLS) regression method for bamboo forest carbon stock estimation based on Landsat Thematic Mapper (TM) data. Two schemes, leave-one-out (LOO) cross validation (scheme 1) and split sample validation (scheme 2), are used to build models. For each scheme, the NLPLS model is compared to a linear partial least square (LPLS) regression model and multivariant linear model based on ordinary least square (LOLS). This research indicates that an optimized NLPLS regression mode can substantially improve the estimation accuracy of Moso bamboo (Phyllostachys heterocycla var. pubescens) carbon stock, and it provides a new method for estimating biophysical variables by using remotely sensed data.  相似文献   

16.
Traditional fast k-nearest neighbor search algorithms based on pyramid structures need either many extra memories or long search time. This paper proposes a fast k-nearest neighbor search algorithm based on the wavelet transform, which exploits the important information hiding in the transform coefficients to reduce the computational complexity. The study indicates that the Haar wavelet transform brings two kinds of important pyramids. Two elimination criteria derived from the transform coefficients are used to reject those impossible candidates. Experimental results on texture classification verify the effectiveness of the proposed algorithm.  相似文献   

17.
Beware of q2!   总被引:23,自引:0,他引:23  
  相似文献   

18.
A post-processing technique for Support Vector Machine (SVM) algorithms for binary classification problems is introduced in order to obtain adequate accuracy on a priority class (labelled as a positive class). That is, the true positive rate (or recall or sensitivity) is prioritized over the accuracy of the overall classifier. Hence, false negative (or Type I) errors receive greater consideration than false positive (Type II) errors during the construction of the model.This post-processing technique tunes the initial bias term once a solution vector is learned by using standard SVM algorithms in two steps: First, a fixed threshold is given as a lower bound for the recall measure; second, the true negative rate (or specificity) is maximized.Experiments, carried out on eleven standard UCI datasets, show that the modified SVM satisfies the aims for which it has been designed. Furthermore, results are comparable or better than those obtained when other state-of-the-art SVM algorithms and other usual metrics are considered.  相似文献   

19.
An important step in building expert and intelligent systems is to obtain the knowledge that they will use. This knowledge can be obtained from experts or, nowadays more often, from machine learning processes applied to large volumes of data. However, for some of these learning processes, if the volume of data is large, the knowledge extraction phase is very slow (or even impossible). Moreover, often the origin of the data sets used for learning are measure processes in which the collected data can contain errors, so the presence of noise in the data is inevitable. It is in such environments where an initial step of noise filtering and reduction of data set size plays a fundamental role. For both tasks, instance selection emerges as a possible solution that has proved to be useful in various fields. In this paper we focus mainly on instance selection for noise removal. In addition, in contrast to most of the existing methods, which applied instance selection to classification tasks (discrete prediction), the proposed approach is used to obtain instance selection methods for regression tasks (prediction of continuous values). The different nature of the value to predict poses an extra difficulty that explains the low number of articles on the subject of instance selection for regression.More specifically the idea used in this article to adapt to regression problems “classic” instance-selection algorithms for classification is as simple as the discretization of the numerical output variable. In the experimentation, the proposed method is compared with much more sophisticated methods, specifically designed for regression, and shows to be very competitive.The main contributions of the paper include: (i) a simple way to adapt to regression instance selection algorithms for classification, (ii) the use of this approach to adapt a popular noise filter called ENN (edited nearest neighbor), and (iii) the comparison of this noise filter against two other specifically designed for regression, showing to be very competitive despite its simplicity.  相似文献   

20.
自适应增强卷积神经网络图像识别   总被引:2,自引:0,他引:2       下载免费PDF全文
目的 为了进一步提高卷积神经网络的收敛性能和识别精度,增强泛化能力,提出一种自适应增强卷积神经网络图像识别算法。方法 构建自适应增强模型,分析卷积神经网络分类识别过程中误差产生的原因和误差反馈模式,针对分类误差进行有目的地训练,实现分类特征基于迭代次数和识别结果的自适应增强以及卷积神经网络权值的优化调整。自适应增强卷积神经网络与多种算法在收敛速度和识别精度等性能上进行对比,并在多种数据集上检测自适应卷积神经网络的泛化能力。结果 通过对比实验可知,自适应增强卷积神经网络算法可以在很大程度上优化收敛效果,提高收敛速度和识别精度,收敛时在手写数字数据集上的误识率可降低20.93%,在手写字母和高光谱图像数据集上的误识率可降低11.82%和15.12%;与不同卷积神经网络优化算法对比,误识率比动态自适应池化算法和双重优化算法最多可降低58.29%和43.50%;基于不同梯度算法的优化,误识率最多可降低33.11%;与不同的图像识别算法对比,识别率也有较大程度提高。结论 实验结果表明,自适应增强卷积神经网络算法可以实现分类特征的自适应增强,对收敛性能和识别精度有较大的提高,对多种数据集有较强的泛化能力。这种自适应增强模型可以进一步推广到其他与卷积神经网络相关的深度学习算法中。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号