首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
提出一种基于训练集分解的不平衡分类算法,该算法使用能输出后验概率的支持向量机作为分类器,使用基于测度层次信息源合并规则实现分类器的集成.在4个不同领域的不平衡数据集上的仿真实验表明:该算法有效提高分类器对正类样本的正确率,同时尽量减少对负类样本的误判.实验结果验证集成学习算法处理不平衡分类问题的有效性.  相似文献   

2.
集成学习被广泛用于提高分类精度, 近年来的研究表明, 通过多模态扰乱策略来构建集成分类器可以进一步提高分类性能. 本文提出了一种基于近似约简与最优采样的集成剪枝算法(EPA_AO). 在EPA_AO中, 我们设计了一种多模态扰乱策略来构建不同的个体分类器. 该扰乱策略可以同时扰乱属性空间和训练集, 从而增加了个体分类器的多样性. 我们利用证据KNN (K-近邻)算法来训练个体分类器, 并在多个UCI数据集上比较了EPA_AO与现有同类型算法的性能. 实验结果表明, EPA_AO是一种有效的集成学习方法.  相似文献   

3.
AdaBoost算法是一种典型的集成学习框架,通过线性组合若干个弱分类器来构造成强学习器,其分类精度远高于单个弱分类器,具有很好的泛化误差和训练误差。然而AdaBoost 算法不能精简输出模型的弱分类器,因而不具备良好的可解释性。本文将遗传算法引入AdaBoost算法模型,提出了一种限制输出模型规模的集成进化分类算法(Ensemble evolve classification algorithm for controlling the size of final model,ECSM)。通过基因操作和评价函数能够在AdaBoost迭代框架下强制保留物种样本的多样性,并留下更好的分类器。实验结果表明,本文提出的算法与经典的AdaBoost算法相比,在基本保持分类精度的前提下,大大减少了分类器数量。  相似文献   

4.
Minimum classification error learning realized via generalized probabilistic descent, usually referred to as (MCE/GPD), is a very popular and powerful framework for building classifiers. This paper first presents a theoretical analysis of MCE/GPD. The focus is on a simple classification problem for estimating the means of two Gaussian classes. For this simple algorithm, we derive difference equations for the class means and decision threshold during learning, and develop closed form expressions for the evolution of both the smoothed and true error. In addition, we show that the decision threshold converges to its optimal value, and provide an estimate of the number of iterations needed to approach convergence. After convergence the class means drift towards increasing their distance to infinity without contributing to the decrease of the classification error. This behavior, referred to as mean drift, is then related to the increase of the variance of the classifier. The theoretical results perfectly agree with simulations carried out for a two-class Gaussian classification problem. In addition to the obtained theoretical results we experimentally verify, in speech recognition experiments, that MCE/GPD learning of Gaussian mixture hidden Markov models qualitatively follows the pattern suggested by the theoretical analysis. We also discuss links between MCE/GPD learning and both batch gradient descent and extended Baum-Welch re-estimation. The latter two approaches are known to be popular in large scale implementations of discriminative training. Hence, the proposed analysis can be used, at least as a rough guideline, for better understanding of the properties of discriminative training algorithms for speech recognition.  相似文献   

5.
We propose a novel discriminative learning approach for Bayesian pattern classification, called ‘constrained maximum margin (CMM)’. We define the margin between two classes as the difference between the minimum decision value for positive samples and the maximum decision value for negative samples. The learning problem is to maximize the margin under the constraint that each training pattern is classified correctly. This nonlinear programming problem is solved using the sequential unconstrained minimization technique. We applied the proposed CMM approach to learn Bayesian classifiers based on Gaussian mixture models, and conducted the experiments on 10 UCI datasets. The performance of our approach was compared with those of the expectation-maximization algorithm, the support vector machine, and other state-of-the-art approaches. The experimental results demonstrated the effectiveness of our approach.  相似文献   

6.
基于改进的Adaboost-BP模型在降水中的预测   总被引:1,自引:0,他引:1  
王军  费凯  程勇 《计算机应用》2017,37(9):2689-2693
针对目前分类算法对降水预测过程存在着泛化能力低、精度不足的问题,提出改进Adaboost算法集成反向传播(BP)神经网络组合分类模型。该模型通过构造多个神经网络弱分类器,赋予弱分类器权值,将其线性组合为强分类器。改进后的Adaboost算法以最优化归一化因子为目标,在提升过程中调整样本权值更新策略,以此达到最小化归一化因子的目的,从而确保增加弱分类器个数的同时降低误差上界估计,通过最终集成的强分类器来提高模型的泛化能力和分类精度。选取江苏境内6个站点的逐日气象资料作为实验数据,建立7个降水等级的预报模型,从对降雨量有影响的众多因素中,选取12个与降水相关性较大的属性作为预报因子。通过多次实验统计,结果表明基于改进的Adaboost-BP组合模型具有较好的性能,尤其对58259站点的适应性较好,总体分类精度达到81%,在7个等级中,对0级降雨的预测精度最好,对其他等级的降雨预测有不同程度的精度提升,理论推导及实验结果证明该种改进可以提高预测精度。  相似文献   

7.
Rotation Forest, an effective ensemble classifier generation technique, works by using principal component analysis (PCA) to rotate the original feature axes so that different training sets for learning base classifiers can be formed. This paper presents a variant of Rotation Forest, which can be viewed as a combination of Bagging and Rotation Forest. Bagging is used here to inject more randomness into Rotation Forest in order to increase the diversity among the ensemble membership. The experiments conducted with 33 benchmark classification data sets available from the UCI repository, among which a classification tree is adopted as the base learning algorithm, demonstrate that the proposed method generally produces ensemble classifiers with lower error than Bagging, AdaBoost and Rotation Forest. The bias–variance analysis of error performance shows that the proposed method improves the prediction error of a single classifier by reducing much more variance term than the other considered ensemble procedures. Furthermore, the results computed on the data sets with artificial classification noise indicate that the new method is more robust to noise and kappa-error diagrams are employed to investigate the diversity–accuracy patterns of the ensemble classifiers.  相似文献   

8.
Ensembles that combine the decisions of classifiers generated by using perturbed versions of the training set where the classes of the training examples are randomly switched can produce a significant error reduction, provided that large numbers of units and high class switching rates are used. The classifiers generated by this procedure have statistically uncorrelated errors in the training set. Hence, the ensembles they form exhibit a similar dependence of the training error on ensemble size, independently of the classification problem. In particular, for binary classification problems, the classification performance of the ensemble on the training data can be analysed in terms of a Bernoulli process. Experiments on several UCI datasets demonstrate the improvements in classification accuracy that can be obtained using these class-switching ensembles.  相似文献   

9.
This paper presents a strategy to improve the AdaBoost algorithm with a quadratic combination of base classifiers. We observe that learning this combination is necessary to get better performance and is possible by constructing an intermediate learner operating on the combined linear and quadratic terms. This is not trivial, as the parameters of the base classifiers are not under direct control, obstructing the application of direct optimization. We propose a new method realizing iterative optimization indirectly. First we train a classifier by randomizing the labels of training examples. Subsequently, the input learner is called repeatedly with a systematic update of the labels of the training examples in each round. We show that the quadratic boosting algorithm converges under the condition that the given base learner minimizes the empirical error. We also give an upper bound on the VC-dimension of the new classifier. Our experimental results on 23 standard problems show that quadratic boosting compares favorably with AdaBoost on large data sets at the cost of training speed. The classification time of the two algorithms, however, is equivalent.  相似文献   

10.
朱亮  徐华  崔鑫 《计算机应用》2021,41(8):2225-2231
针对传统AdaBoost算法的基分类器线性组合效率低以及过适应的问题,提出了一种基于基分类器系数与多样性的改进算法——WD AdaBoost。首先,根据基分类器的错误率与样本权重的分布状态,给出新的基分类器系数求解方法,以提高基分类器的组合效率;其次,在基分类器的选择策略上,WD AdaBoost算法引入双误度量以增加基分类器间的多样性。在五个来自不同实际应用领域的数据集上,与传统AdaBoost算法相比,CeffAda算法使用新的基分类器系数求解方法使测试误差平均降低了1.2个百分点;同时,WD AdaBoost算法与WLDF_Ada、AD_Ada、sk_AdaBoost等算法相对比,具有更低的错误率。实验结果表明,WD AdaBoost算法能够更高效地集成基分类器,抵抗过拟合,并可以提高分类性能。  相似文献   

11.
目的 细粒度分类近年来受到了越来越多研究者的广泛关注,其难点是分类目标间的差异非常小。为此提出一种分类错误指导的分层双线性卷积神经网络模型。方法 该模型的核心思想是将双线性卷积神经网络算法(B-CNN)容易分错、混淆的类再分别进行重新训练和分类。首先,为得到易错类,提出分类错误指导的聚类算法。该算法基于受限拉普拉斯秩(CLR)聚类模型,其核心“关联矩阵”由“分类错误矩阵”构造。其次,以聚类结果为基础,构建了新的分层B-CNN模型。结果 用分类错误指导的分层B-CNN模型在CUB-200-2011、 FGVC-Aircraft-2013b和Stanford-cars 3个标准数据集上进行了实验,相比于单层的B-CNN模型,分类准确率分别由84.35%,83.56%,89.45%提高到了84.67%,84.11%,89.78%,验证了本文算法的有效性。结论 本文提出了用分类错误矩阵指导聚类从而进行重分类的方法,相对于基于特征相似度而构造的关联矩阵,分类错误矩阵直接针对分类问题,可以有效提高易混淆类的分类准确率。本文方法针对比较相近的目标,尤其是有非常相近的目标的情况,通过将容易分错、混淆的目标分组并进行再训练和重分类,使得分类效果更好,适用于细粒度分类问题。  相似文献   

12.
This paper presents a new method for linearly combining multiple neural network classifiers based on the statistical pattern recognition theory. In our approach, several neural networks are first selected based on which works best for each class in terms of minimizing classification errors. Then, they are linearly combined to form an ideal classifier that exploits the strengths of the individual classifiers. In this approach, the minimum classification error criterion is utilized to estimate the optimal linear weights. In this formulation, because the classification decision rule is incorporated into the cost function, a more suitable better combination of weights for the classification objective could be obtained. Experimental results using artificial and real data sets show that the proposed method can construct a better combined classifier that outperforms the best single classifier in terms of overall classification errors for test data  相似文献   

13.
Automatic text classification is one of the most important tools in Information Retrieval. This paper presents a novel text classifier using positive and unlabeled examples. The primary challenge of this problem as compared with the classical text classification problem is that no labeled negative documents are available in the training example set. Firstly, we identify many more reliable negative documents by an improved 1-DNF algorithm with a very low error rate. Secondly, we build a set of classifiers by iteratively applying the SVM algorithm on a training data set, which is augmented during iteration. Thirdly, different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Finally, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct the final classifier based on PSO (Particle Swarm Optimization), which can discover the best combination of the weights. In addition, we built a focused crawler based on link-contexts guided by different classifiers to evaluate our method. Several comprehensive experiments have been conducted using the Reuters data set and thousands of web pages. Experimental results show that our method increases the performance (F1-measure) compared with PEBL, and a focused web crawler guided by our PSO-based classifier outperforms other several classifiers both in harvest rate and target recall.  相似文献   

14.
杨菊  袁玉龙  于化龙 《计算机科学》2016,43(10):266-271
针对现有极限学习机集成学习算法分类精度低、泛化能力差等缺点,提出了一种基于蚁群优化思想的极限学习机选择性集成学习算法。该算法首先通过随机分配隐层输入权重和偏置的方法生成大量差异的极限学习机分类器,然后利用一个二叉蚁群优化搜索算法迭代地搜寻最优分类器组合,最终使用该组合分类测试样本。通过12个标准数据集对该算法进行了测试,该算法在9个数据集上获得了最优结果,在另3个数据集上获得了次优结果。采用该算法可显著提高分类精度与泛化性能。  相似文献   

15.
《Applied Soft Computing》2007,7(3):1102-1111
Classification and association rule discovery are important data mining tasks. Using association rule discovery to construct classification systems, also known as associative classification, is a promising approach. In this paper, a new associative classification technique, Ranked Multilabel Rule (RMR) algorithm is introduced, which generates rules with multiple labels. Rules derived by current associative classification algorithms overlap in their training objects, resulting in many redundant and useless rules. However, the proposed algorithm resolves the overlapping between rules in the classifier by generating rules that does not share training objects during the training phase, resulting in a more accurate classifier. Results obtained from experimenting on 20 binary, multi-class and multi-label data sets show that the proposed technique is able to produce classifiers that contain rules associated with multiple classes. Furthermore, the results reveal that removing overlapping of training objects between the derived rules produces highly competitive classifiers if compared with those extracted by decision trees and other associative classification techniques, with respect to error rate.  相似文献   

16.
多分类问题代价敏感AdaBoost算法   总被引:8,自引:2,他引:6  
付忠良 《自动化学报》2011,37(8):973-983
针对目前多分类代价敏感分类问题在转换成二分类代价敏感分类问题存在的代价合并问题, 研究并构造出了可直接应用于多分类问题的代价敏感AdaBoost算法.算法具有与连续AdaBoost算法 类似的流程和误差估计. 当代价完全相等时, 该算法就变成了一种新的多分类的连续AdaBoost算法, 算法能够确保训练错误率随着训练的分类器的个数增加而降低, 但不直接要求各个分类器相互独立条件, 或者说独立性条件可以通过算法规则来保证, 但现有多分类连续AdaBoost算法的推导必须要求各个分类器相互独立. 实验数据表明, 算法可以真正实现分类结果偏向错分代价较小的类, 特别当每一类被错分成其他类的代价不平衡但平均代价相等时, 目前已有的多分类代价敏感学习算法会失效, 但新方法仍然能 实现最小的错分代价. 研究方法为进一步研究集成学习算法提供了一种新的思路, 得到了一种易操作并近似满足分类错误率最小的多标签分类问题的AdaBoost算法.  相似文献   

17.
Fast recognition of musical genres using RBF networks   总被引:2,自引:0,他引:2  
This paper explores the automatic classification of audio tracks into musical genres. Our goal is to achieve human-level accuracy with fast training and classification. This goal is achieved with radial basis function (RBF) networks by using a combination of unsupervised and supervised initialization methods. These initialization methods yield classifiers that are as accurate as RBF networks trained with gradient descent (which is hundreds of times slower). In addition, feature subset selection further reduces training and classification time while preserving classification accuracy. Combined, our methods succeed in creating an RBF network that matches the musical classification accuracy of humans. The general algorithmic contribution of this paper is to show experimentally that RBF networks initialized with a combination of methods can yield good classification performance without relying on gradient descent. The simplicity and computational efficiency of our initialization methods produce classifiers that are fast to train as well as fast to apply to novel data. We also present an improved method for initializing the k-means clustering algorithm, which is useful for both unsupervised and supervised initialization methods.  相似文献   

18.
针对传统Adaboost算法存在训练耗时长的问题,提出一种基于特征裁剪的双阈值Adaboost算法人脸检测算法。一方面,使用双阈值的弱分类器代替传统的单阈值弱分类器,提升单个弱分类器的分类能力;另一方面,特征裁剪的Adaboost算法在每轮训练中仅仅利用错误率较小的特征进行训练。实验表明基于特征裁剪的双阈值Adaboost人脸检测算法通过使用较少的特征和减少训练时的特征数量的方式,提高了算法的训练速度。  相似文献   

19.
Remote sensing image classification is a common application of remote sensing images. In order to improve the performance of Remote sensing image classification, multiple classifier combinations are used to classify the Landsat-8 Operational Land Imager (Landsat-8 OLI) images. Some techniques and classifier combination algorithms are investigated. The classifier ensemble consisting of five member classifiers is constructed. The results of every member classifier are evaluated. The voting strategy is experimented to combine the classification results of the member classifier. The results show that all the classifiers have different performances and the multiple classifier combination provides better performance than a single classifier, and achieves higher overall accuracy of classification. The experiment shows that the multiple classifier combination using producer’s accuracy as voting-weight (MCCmod2 and MCCmod3) present higher classification accuracy than the algorithm using overall accuracy as voting-weight (MCCmod1).And the multiple classifier combinations using different voting-weights affected the classification result in different land-cover types. The multiple classifier combination algorithm presented in this article using voting-weight based on the accuracy of multiple classifier may have stability problems, which need to be addressed in future studies.  相似文献   

20.
杨显飞  张健沛  杨静 《计算机工程》2011,37(20):180-182
选择性集成分类算法虽能提高集合分类器在整体数据集上的分类性能,但针对某一具体数据进行分类时,其选择出的个体分类器集合并不一定是最优组合。为此,从数据自适应角度出发,提出一种数据流选择性集成的两阶段动态融合方法,利用待分类数据所在特征空间中的位置,动态选择个体分类器集合,并对其进行分类。理论分析和实验结果表明,与GASEN算法相比,该方法的分类准确率更高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号