首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper introduces a new ensemble approach, Feature-Subspace Aggregating (Feating), which builds local models instead of global models. Feating is a generic ensemble approach that can enhance the predictive performance of both stable and unstable learners. In contrast, most existing ensemble approaches can improve the predictive performance of unstable learners only. Our analysis shows that the new approach reduces the execution time to generate a model in an ensemble through an increased level of localisation in Feating. Our empirical evaluation shows that Feating performs significantly better than Boosting, Random Subspace and Bagging in terms of predictive accuracy, when a stable learner SVM is used as the base learner. The speed up achieved by Feating makes feasible SVM ensembles that would otherwise be infeasible for large data sets. When SVM is the preferred base learner, we show that Feating SVM performs better than Boosting decision trees and Random Forests. We further demonstrate that Feating also substantially reduces the error of another stable learner, k-nearest neighbour, and an unstable learner, decision tree.  相似文献   

2.
This study presents the applicability of support vector machine (SVM) ensemble for traffic incident detection. The SVM has been proposed to solve the problem of traffic incident detection, because it is adapted to produce a nonlinear classifier with maximum generality, and it has exhibited good performance as neural networks. However, the classification result of the practically implemented SVM depends on the choosing of kernel function and parameters. To avoid the burden of choosing kernel functions and tuning the parameters, furthermore, to improve the limited classification performance of the real SVM, and enhance the detection performance, we propose to use the SVM ensembles to detect incident. In addition, we also propose a new aggregation method to combine SVM classifiers based on certainty. Moreover, we proposed a reasonable hybrid performance index (PI) to evaluate the performance of SVM ensemble for detecting incident by combining the common criteria, detection rate (DR), false alarm rate (FAR), mean time to detection (MTTD), and classification rate (CR). Several SVM ensembles have been developed based on bagging, boosting and cross-validation committees with different combining approaches, and the SVM ensemble has been tested on one real data collected at the I-880 Freeway in California. The experimental results show that the SVM ensembles outperform a single SVM based AID in terms of DR, FAR, MTTD, CR and PI. We used one non-parametric test, the Wilcoxon signed ranks test, to make a comparison among six combining schemes. Our proposed combining method performs as well as majority vote and weighted vote. Finally, we also investigated the influence of the size of ensemble on detection performance.  相似文献   

3.
航空发动机故障样本有限,利用传统的统计识别方法故障诊断,正确率不高.支撑向量机能解决小样本的故障分类识别问题.研究Support Vector Machine(简称SVM)核函数对识别精度的影响,并把SVM与最大似然法、马氏距离法,最小距离法进行比较,结果表明SVM核函数对故障识别正确率影响不大,基于SVM的航空发动机...  相似文献   

4.
AdaBoost is a highly effective ensemble learning method that combines several weak learners to produce a strong committee with higher accuracy. However, similar to other ensemble methods, AdaBoost uses a large number of base learners to produce the final outcome while addressing high-dimensional data. Thus, it poses a critical challenge in the form of high memory-space consumption. Feature selection methods can significantly reduce dimensionality in regression and have been established to be applicable in ensemble pruning. By pruning the ensemble, it is possible to generate a simpler ensemble with fewer base learners but a higher accuracy. In this article, we propose the minimax concave penalty (MCP) function to prune an AdaBoost ensemble to simplify the model and improve its accuracy simultaneously. The MCP penalty function is compared with LASSO and SCAD in terms of performance in pruning the ensemble. Experiments performed on real datasets demonstrate that MCP-pruning outperforms the other two methods. It can reduce the ensemble size effectively, and generate marginally more accurate predictions than the unpruned AdaBoost model.  相似文献   

5.
Classification is the most used supervized machine learning method. As each of the many existing classification algorithms can perform poorly on some data, different attempts have arisen to improve the original algorithms by combining them. Some of the best know results are produced by ensemble methods, like bagging or boosting. We developed a new ensemble method called allocation. Allocation method uses the allocator, an algorithm that separates the data instances based on anomaly detection and allocates them to one of the micro classifiers, built with the existing classification algorithms on a subset of training data. The outputs of micro classifiers are then fused together into one final classification. Our goal was to improve the results of original classifiers with this new allocation method and to compare the classification results with existing ensemble methods. The allocation method was tested on 30 benchmark datasets and was used with six well known basic classification algorithms (J48, NaiveBayes, IBk, SMO, OneR and NBTree). The obtained results were compared to those of the basic classifiers as well as other ensemble methods (bagging, MultiBoost and AdaBoost). Results show that our allocation method is superior to basic classifiers and also to tested ensembles in classification accuracy and f-score. The conducted statistical analysis, when all of the used classification algorithms are considered, confirmed that our allocation method performs significantly better both in classification accuracy and f-score. Although the differences are not significant for each of the used basic classifier alone, the allocation method achieved the biggest improvements on all six basic classification algorithms. In this manner, allocation method proved to be a competitive ensemble method for classification that can be used with various classification algorithms and can possibly outperform other ensembles on different types of data.  相似文献   

6.
This article proposes a new approach to improve the classification performance of remotely sensed images with an aggregative model based on classifier ensemble (AMCE). AMCE is a multi-classifier system with two procedures, namely ensemble learning and predictions combination. Two ensemble algorithms (Bagging and AdaBoost.M1) were used in the ensemble learning process to stabilize and improve the performance of single classifiers (i.e. maximum likelihood classifier, minimum distance classifier, back propagation neural network, classification and regression tree, and support vector machine (SVM)). Prediction results from single classifiers were integrated according to a diversity measurement with an averaged double-fault indicator and different combination strategies (i.e. weighted vote, Bayesian product, logarithmic consensus, and behaviour knowledge space). The suitability of the AMCE model was examined using a Landsat Thematic Mapper (TM) image of Dongguan city (Guangdong, China), acquired on 2 January 2009. Experimental results show that the proposed model was significantly better than the most accurate single classification (i.e. SVM) in terms of classification accuracy (i.e. from 88.83% to 92.45%) and kappa coefficient (i.e. from 0.8624 to 0.9088). A stepwise comparison illustrates that both ensemble learning and predictions combination with the AMCE model improved classification.  相似文献   

7.
支撑向量机在高光谱遥感图像分类中的应用   总被引:1,自引:1,他引:0  
许将军  赵辉 《计算机仿真》2009,26(12):164-167
高光谱遥感图像具有维数高的特点,当样本较少时,利用传统的统计识别方法分类,分类精度低.可支撑向量机(SVM)能解决小样本、高维、非线性分类问题.采用归一化法对原始图像做预处理,再分析不同的SVM核函数对分类精度的影响;并把SVM与最小距离法,马氏距离法等的分类结果进行比较.结果表明SVM的核函数类型对分类正确率影响不大,其分类精度高于传统的统计识别方法.  相似文献   

8.
结合随机子空间和核极端学习机集成提出了一种新的高光谱遥感图像分类方法。首先利用随机子空间方法从高光谱遥感图像数据的整体特征中随机生成多个大小相同的特征子集;然后利用核极端学习机在这些特征子集上进行训练从而获得基分类器;最后将所有基分类器的输出集成起来,通过投票机制得到分类结果。在高光谱遥感图像数据集上的实验结果表明:所提方法能够提高分类效果,且其分类总精度要高于核极端学习机和随机森林方法。  相似文献   

9.
网络作弊检测是搜索引擎的重要挑战之一,该文提出基于遗传规划的集成学习方法 (简记为GPENL)来检测网络作弊。该方法首先通过欠抽样技术从原训练集中抽样得到t个不同的训练集;然后使用c个不同的分类算法对t个训练集进行训练得到t*c个基分类器;最后利用遗传规划得到t*c个基分类器的集成方式。新方法不仅将欠抽样技术和集成学习融合起来提高非平衡数据集的分类性能,还能方便地集成不同类型的基分类器。在WEBSPAM-UK2006数据集上所做的实验表明无论是同态集成还是异态集成,GPENL均能提高分类的性能,且异态集成比同态集成更加有效;GPENL比AdaBoost、Bagging、RandomForest、多数投票集成、EDKC算法和基于Prediction Spamicity的方法取得更高的F-度量值。  相似文献   

10.
Failure mode (FM) and bearing capacity of reinforced concrete (RC) columns are key concerns in structural design and/or performance assessment procedures. The failure types, i.e., flexure, shear, or mix of the above two, will greatly affect the capacity and ductility of the structure. Meanwhile, the design methodologies for structures of different failure types will be totally different. Therefore, developing efficient and reliable methods to identify the FM and predict the corresponding capacity is of special importance for structural design/assessment management. In this paper, an intelligent approach is presented for FM classification and bearing capacity prediction of RC columns based on the ensemble machine learning techniques. The most typical ensemble learning method, adaptive boosting (AdaBoost) algorithm, is adopted for both classification and regression (prediction) problems. Totally 254 cyclic loading tests of RC columns are collected. The geometric dimensions, reinforcing details, material properties are set as the input variables, while the failure types (for classification problem) and peak capacity forces (for regression problem) are set as the output variables. The results indicate that the model generated by the AdaBoost learning algorithm has a very high accuracy for both FM classification (accuracy = 0.96) and capacity prediction (R2 = 0.98). Different learning algorithms are also compared and the results show that ensemble learning (especially AdaBoost) has better performance than single learning. In addition, the bearing capacity predicted by the AdaBoost is also compared to that by the empirical formulas provided by the design codes, which shows an obvious superior of the proposed method. In summary, the machine learning technique, especially the ensemble learning, can provide an alternate to the conventional mechanics-driven models in structural design in this big data time.  相似文献   

11.
多尺度核方法是当前核机器学习领域的一个热点。通常多尺度核的学习在多核处理时存在诸如多核平均组合、迭代学习时间长、经验选择合成系数等弊端。文中基于核目标度量规则,提出一种多尺度核方法的自适应序列学习算法,实现多核加权系数的自动快速求取。实验表明,该方法在回归精度、分类正确率方面比单核支持向量机方法结果更优,函数拟合与分类稳定性更强,证明该算法具有普遍适用性。  相似文献   

12.
Kernel Function in SVM-RFE based Hyperspectral Data band Selection   总被引:2,自引:0,他引:2  
Supporting vector machine recursive feature elimination (SVM-RFE) has a low efficiency when it is applied to band selection for hyperspectral dada,since it usually uses a non-linear kernel and trains SVM every time after deleting a band.Recent research shows that SVM with non-linear kernel doesn’t always perform better than linear one for SVM classification.Similarly,there is some uncertainty on which kernel is better in SVM-RFE based band selection.This paper compares the classification results in SVM-RFE using two SVMs,then designs two optimization strategies for accelerating the band selection process:the percentage accelerated method and the fixed accelerated method.Through an experiment on AVIRIS hyperspectral data,this paper found:① Classification precision of SVM will slightly decrease with the increasing of redundant bands,which means SVM classification needs feature selection in terms of classification accuracy;② The best band collection selected by SVM-RFE with linear SVM that has higher classification accuracy and less effective bands than that with non-linear SVM;③ Both two optimization strategies improved the efficiency of the feature selection,and percentage eliminating performed better than fixed eliminating method in terms of computational efficiency and classification accuracy.  相似文献   

13.
Due to the important role of financial distress prediction (FDP) for enterprises, it is crucial to improve the accuracy of FDP model. In recent years, classifier ensemble has shown promising advantage over single classifier, but the study on classifier ensemble methods for FDP is still not comprehensive enough and leaves to be further explored. This paper constructs AdaBoost ensemble respectively with single attribute test (SAT) and decision tree (DT) for FDP, and empirically compares them with single DT and support vector machine (SVM). After designing the framework of AdaBoost ensemble method for FDP, the article describes AdaBoost algorithm as well as SAT and DT algorithm in detail, which is followed by the combination mechanism of multiple classifiers. On the initial sample of 692 Chinese listed companies and 41 financial ratios, 30 times of holdout experiments are carried out for FDP respectively one year, two years, and three years in advance. In terms of experimental results, AdaBoost ensemble with SAT outperforms AdaBoost ensemble with DT, single DT classifier and single SVM classifier. As a conclusion, the choice of weak learner is crucial to the performance of AdaBoost ensemble, and AdaBoost ensemble with SAT is more suitable for FDP of Chinese listed companies.  相似文献   

14.
针对多分类问题,本文提出一种基于混淆矩阵和集成学习的分类方法。从模式间的相似性关系入手,基于混淆矩阵产生层次化分类器结构;以支持向量机(SVM)作为基本的两类分类器,对于分类精度不理想的SVM,通过AdaBoost算法对SVM分类器进行加权投票。以变电站环境监控中的目标识别为例(涉及到人、动物、普通火焰(红黄颜色火焰)、白色火焰、白炽灯),实现了变电站环境监控中的目标分类。实验表明,所提出的方法有效提高了分类精度。  相似文献   

15.
Recent abundance of moderate-to-high spatial resolution satellite imagery has facilitated land-cover map production. However, in cloud-prone areas, building high-resolution land-cover maps is still challenging due to infrequent satellite revisits and lack of cloud-free data. We propose a classification method for cloud-persistent areas with high temporal dynamics of land-cover types. First, compositing techniques are employed to create dense time-series composite images from all available Landsat 8 images. Then, spectral–temporal features are extracted to train an ensemble of five supervised classifiers. The resulting composite images are clear with at least 99.78% cloud-free pixels and are 20.47% better than their original images on average. We classify seven land classes, including paddy rice, cropland, grass/shrub, trees, bare land, impervious area, and waterbody over Hanoi, Vietnam, in 2016. Using a time series of composites significantly improves the classification performance with 10.03% higher overall accuracy (OA) compared to single composite classifications. Additionally, using time series of composites and the ensemble technique, which combines the best of five experimented classifiers (eXtreme Gradient Boosting, logistic regression, Support Vector Machine (SVM) with Radial Basis Function (RBF) kernel – SVM–RBF and Linear kernel – SVM–Linear, multilayer perceptron), performed best with 84% OA and 0.79 kappa coefficient.  相似文献   

16.
In machine learning, a combination of classifiers, known as an ensemble classifier, often outperforms individual ones. While many ensemble approaches exist, it remains, however, a difficult task to find a suitable ensemble configuration for a particular dataset. This paper proposes a novel ensemble construction method that uses PSO generated weights to create ensemble of classifiers with better accuracy for intrusion detection. Local unimodal sampling (LUS) method is used as a meta-optimizer to find better behavioral parameters for PSO. For our empirical study, we took five random subsets from the well-known KDD99 dataset. Ensemble classifiers are created using the new approaches as well as the weighted majority algorithm (WMA) approach. Our experimental results suggest that the new approach can generate ensembles that outperform WMA in terms of classification accuracy.  相似文献   

17.
Several pruning strategies that can be used to reduce the size and increase the accuracy of bagging ensembles are analyzed. These heuristics select subsets of complementary classifiers that, when combined, can perform better than the whole ensemble. The pruning methods investigated are based on modifying the order of aggregation of classifiers in the ensemble. In the original bagging algorithm, the order of aggregation is left unspecified. When this order is random, the generalization error typically decreases as the number of classifiers in the ensemble increases. If an appropriate ordering for the aggregation process is devised, the generalization error reaches a minimum at intermediate numbers of classifiers. This minimum lies below the asymptotic error of bagging. Pruned ensembles are obtained by retaining a fraction of the classifiers in the ordered ensemble. The performance of these pruned ensembles is evaluated in several benchmark classification tasks under different training conditions. The results of this empirical investigation show that ordered aggregation can be used for the efficient generation of pruned ensembles that are competitive, in terms of performance and robustness of classification, with computationally more costly methods that directly select optimal or near-optimal subensembles.  相似文献   

18.
Training set resampling based ensemble design techniques are successfully used to reduce the classification errors of the base classifiers. Boosting is one of the techniques used for this purpose where each training set is obtained by drawing samples with replacement from the available training set according to a weighted distribution which is modified for each new classifier to be included in the ensemble. The weighted resampling results in a classifier set, each being accurate in different parts of the input space mainly specified the sample weights. In this study, a dynamic integration of boosting based ensembles is proposed so as to take into account the heterogeneity of the input sets. An evidence-theoretic framework is developed for this purpose so as to take into account the weights and distances of the neighboring training samples in both training and testing boosting based ensembles. The effectiveness of the proposed technique is compared to the AdaBoost algorithm using three different base classifiers.  相似文献   

19.
Boosting algorithms pay attention to the particular structure of the training data when learning, by means of iteratively emphasizing the importance of the training samples according to their difficulty for being correctly classified. If common kernel Support Vector Machines (SVMs) are used as basic learners to construct a Real AdaBoost ensemble, the resulting ensemble can be easily compacted into a monolithic architecture by simply combining the weights that correspond to the same kernels when they appear in different learners, avoiding to increase the operation computational effort for the above potential advantage. This way, the performance advantage that boosting provides can be obtained for monolithic SVMs, i.e., without paying in classification computational effort because many learners are needed. However, SVMs are both stable and strong, and their use for boosting requires to unstabilize and to weaken them. Yet previous attempts in this direction show a moderate success.In this paper, we propose a combination of a new and appropriately designed subsampling process and an SVM algorithm which permits sparsity control to solve the difficulties in boosting SVMs for obtaining improved performance designs. Experimental results support the effectiveness of the approach, not only in performance, but also in compactness of the resulting classifiers, as well as that combining both design ideas is needed to arrive to these advantageous designs.  相似文献   

20.
Financial distress prediction (FDP) is of great importance to both inner and outside parts of companies. Though lots of literatures have given comprehensive analysis on single classifier FDP method, ensemble method for FDP just emerged in recent years and needs to be further studied. Support vector machine (SVM) shows promising performance in FDP when compared with other single classifier methods. The contribution of this paper is to propose a new FDP method based on SVM ensemble, whose candidate single classifiers are trained by SVM algorithms with different kernel functions on different feature subsets of one initial dataset. SVM kernels such as linear, polynomial, RBF and sigmoid, and the filter feature selection/extraction methods of stepwise multi discriminant analysis (MDA), stepwise logistic regression (logit), and principal component analysis (PCA) are applied. The algorithm for selecting SVM ensemble's base classifiers from candidate ones is designed by considering both individual performance and diversity analysis. Weighted majority voting based on base classifiers’ cross validation accuracy on training dataset is used as the combination mechanism. Experimental results indicate that SVM ensemble is significantly superior to individual SVM classifier when the number of base classifiers in SVM ensemble is properly set. Besides, it also shows that RBF SVM based on features selected by stepwise MDA is a good choice for FDP when individual SVM classifier is applied.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号