首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
樊康新 《计算机工程》2009,35(24):191-193
针对朴素贝叶斯(NB)分类器在分类过程中存在诸如分类模型对样本具有敏感性、分类精度难以提高等缺陷,提出一种基于多种特征选择方法的NB组合文本分类器方法。依据Boosting分类算法,采用多种不同的特征选择方法建立文本的特征词集,训练NB分类器作为Boosting迭代过程的基分类器,通过对基分类器的加权投票生成最终的NB组合文本分类器。实验结果表明,该组合分类器较单NB文本分类器具有更好的分类性能。  相似文献   

2.
提出了一种使用基于规则的基分类器建立组合分类器的新方法PCARules。尽管新方法也采用基分类器预测的加权投票来决定待分类样本的类,但是为基分类器创建训练数据集的方法与bagging和boosting完全不同。该方法不是通过抽样为基分类器创建数据集,而是随机地将特征划分成K个子集,使用PCA得到每个子集的主成分,形成新的特征空间,并将所有训练数据映射到新的特征空间作为基分类器的训练集。在UCI机器学习库的30个随机选取的数据集上的实验表明:算法不仅能够显著提高基于规则的分类方法的分类性能,而且与bagging和boosting等传统组合方法相比,在大部分数据集上都具有更高的分类准确率。  相似文献   

3.
曹鹏  李博  栗伟  赵大哲 《计算机应用》2013,33(2):550-553
针对大规模数据的分类准确率低且效率下降的问题,提出一种结合X-means聚类的自适应随机子空间组合分类算法。首先使用X-means聚类方法,保持原有数据结构的同时,把复杂的数据空间自动分解为多个样本子空间进行分治学习;而自适应随机子空间组合分类器,提升了基分类器的差异性并自动确定基分类器数量,提升了组合分类器的鲁棒性及分类准确性。该算法在人工和UCI数据集上进行了测试,并与传统单分类和组合分类算法进行了比较。实验结果表明,对于大规模数据集,该方法具有更好的分类精度和健壮性,并提升了整体算法的效率。  相似文献   

4.
线性判别分类器是一种有效的模式分析技术,其中以Fisher判别法准则应用最广,目前已有多种改进线性提取方法。引进信息增益,建立基于信息增益的最优组合因子判别分类器,实现最优组合因子判别分类器的优化。实验结果表明优化后的分类器有效剔除了冗余因子,具有良好的分类准确率。  相似文献   

5.
本文讨论了分类器组合的叠加法以及相应的叠加策略,提出了一种新的基于叠加法的元学习策略。该策略对基分类器的预测结果进行投票表决,将表决的结果作为1层训练数据。实验结果表明,该方法比简单平均化验概率法的分类效果要好。  相似文献   

6.
传统的数据分类算法多是基于平衡的数据集创建,对不平衡数据分类时性能下降,而实践表明组合选择能有效提高算法在不平衡数据集上的分类性能。为此,从组合选择的角度考虑不平衡类学习问题,提出一种新的组合剪枝方法,用于提升组合分类器在不平衡数据上的分类性能。使用Bagging建立分类器库,直接用正类(少数类)实例作为剪枝集,并通过MBM指标和剪枝集,从分类器库中选择一个最优或次优子组合分类器作为目标分类器,用于预测待分类实例。在12个UCI数据集上的实验结果表明,与EasyEnsemble、Bagging和C4.5算法相比,该方法不但能大幅提升组合分类器在正类上的召回率,而且还能提升总体准确率。  相似文献   

7.
为提高数据分类的性能,提出了一种基于信息熵[1]的多分类器动态组合方法(EMDA)。此方法在多个UCI标准数据集上进行了测试,并与由集成学习算法—AdaBoost,训练出的各个基分类器的分类效果进行比较,证明了该算法的有效性。  相似文献   

8.
由于高光谱数据具有波段多,数据量大等特点,对其进行降维处理成为高光谱遥感研究的一个重要问题。提出一种基于多分类器组合的高光谱波段选择方法,该方法通过遗传算法良好的寻优能力获得若干组较优初始波段子集,在此基础上使用这些波段子集训练若干个基分类器,进而利用改进的基于相同错误差异性度量的分类器选择方法选出部分较优分类器,实现波段选择的目的;最终通过局部精度分析的动态分类器选择实现多分类器组合决策。在公共测试数据集上的实验结果表明:与以往直接选择最优波段子集方法相比,提出的算法能够选择更多具有鉴别能力的波段,明显提高了分类正确率。  相似文献   

9.
一种分类器选择方法   总被引:1,自引:1,他引:0       下载免费PDF全文
牛鹏  魏维  李峻金  郭建国 《计算机工程》2010,36(14):163-165
在按照“测试-选择”方法设计多分类器系统时,从超量生成的候选分类器集中选取一个最优子集是关键环节之一。基于此,定义一个组合适宜度概念,提出一种新的分类器选择方法。将该方法用于高光谱遥感数据分类实验中,并从具有27个候选的分类器集中挑选子集。实验结果表明,该方法在选择效率和识别精度方面具有优势,能保证所选子集的泛化能力。  相似文献   

10.
为了提高不平衡数据集中少数类的分类准确率,文章对组合分类算法进行了研究,提出了一种新的组合分类算法WDB.该算法采用决策树C4.5和朴素贝叶斯两种不同的分类器作为基分类器,选择精确度(precision)作为权值,根据不同的训练集,通过"权值学习"的方式自动调整各基分类器的权值大小,然后,结合各基分类器的预测结果,利用加权平均法进行代数组合,构造出一种新的分类算法WDB.最后,以开放的不平衡数据集作为数据源,利用常见的性能评价指标进行实验验证.实验结果证明,在组合分类算法中引入"权值学习"能够发挥基分类器对于特定数据类型的分类优势,提高预测结果的准确率.WDB算法对不平衡数据集分类的性能优于决策树C4.5算法、朴素贝叶斯算法及随机森林算法,能够有效提升不平衡数据集中少数类的分类准确率.  相似文献   

11.
This paper proposes a new measure for ensemble pruning via directed hill climbing, dubbed Uncertainty Weighted Accuracy (UWA), which takes into account the uncertainty of the decision of the current ensemble. Empirical results on 30 data sets show that using the proposed measure to prune a heterogeneous ensemble leads to significantly better accuracy results compared to state-of-the-art measures and other baseline methods, while keeping only a small fraction of the original models. Besides the evaluation measure, the paper also studies two other parameters of directed hill climbing ensemble pruning methods, the search direction and the evaluation dataset, with interesting conclusions on appropriate values.  相似文献   

12.
相比于集成学习,集成剪枝方法是在多个分类器中搜索最优子集从而改善分类器的泛化性能,简化集成过程。帕累托集成剪枝方法同时考虑了分类器的精准度及集成规模两个方面,并将二者均作为优化的目标。然而帕累托集成剪枝算法只考虑了基分类器的精准度与集成规模,忽视了分类器之间的差异性,从而导致了分类器之间的相似度比较大。本文提出了融入差异性的帕累托集成剪枝算法,该算法将分类器的差异性与精准度综合为第1个优化目标,将集成规模作为第2个优化目标,从而实现多目标优化。实验表明,当该改进的集成剪枝算法与帕累托集成剪枝算法在集成规模相当的前提下,由于差异性的融入该改进算法能够获得较好的性能。  相似文献   

13.
Ensemble pruning deals with the selection of base learners prior to combination in order to improve prediction accuracy and efficiency. In the ensemble literature, it has been pointed out that in order for an ensemble classifier to achieve higher prediction accuracy, it is critical for the ensemble classifier to consist of accurate classifiers which at the same time diverse as much as possible. In this paper, a novel ensemble pruning method, called PL-bagging, is proposed. In order to attain the balance between diversity and accuracy of base learners, PL-bagging employs positive Lasso to assign weights to base learners in the combination step. Simulation studies and theoretical investigation showed that PL-bagging filters out redundant base learners while it assigns higher weights to more accurate base learners. Such improved weighting scheme of PL-bagging further results in higher classification accuracy and the improvement becomes even more significant as the ensemble size increases. The performance of PL-bagging was compared with state-of-the-art ensemble pruning methods for aggregation of bootstrapped base learners using 22 real and 4 synthetic datasets. The results indicate that PL-bagging significantly outperforms state-of-the-art ensemble pruning methods such as Boosting-based pruning and Trimmed bagging.  相似文献   

14.
理论及实验表明,在训练集上具有较大边界分布的组合分类器泛化能力较强。文中将边界概念引入到组合剪枝中,并用它指导组合剪枝方法的设计。基于此,构造一个度量标准(MBM)用于评估基分类器相对于组合分类器的重要性,进而提出一种贪心组合选择方法(MBMEP)以降低组合分类器规模并提高它的分类准确率。在随机选择的30个UCI数据集上的实验表明,与其它一些高级的贪心组合选择算法相比,MBMEP选择出的子组合分类器具有更好的泛化能力。  相似文献   

15.
A generalisation of bottom-up pruning is proposed as a model level combination method for a decision tree ensemble. Bottom up pruning on a single tree involves choosing between a subtree rooted at a node, and a leaf, dependant on a pruning criterion. A natural extension to an ensemble of trees is to allow subtrees from other ensemble trees to be grafted onto a node in addition to the operations of pruning to a leaf and leaving the existing subtree intact. Suitable pruning criteria are proposed and tested for this multi-tree pruning context. Gains in both performance and in particular compactness over individually pruned trees are observed in tests performed on a number of datasets from the UCI database. The method is further illustrated on a churn prediction problem in the telecommunications domain.  相似文献   

16.
Ensemble pruning deals with the reduction of base classifiers prior to combination in order to improve generalization and prediction efficiency. Existing ensemble pruning algorithms require much pruning time. This paper presents a fast pruning approach: pattern mining based ensemble pruning (PMEP). In this algorithm, the prediction results of all base classifiers are organized as a transaction database, and FP-Tree structure is used to compact the prediction results. Then a greedy pattern mining method is explored to find the ensemble of size k. After obtaining the ensembles of all possible sizes, the one with the best accuracy is outputted. Compared with Bagging, GASEN, and Forward Selection, experimental results show that PMEP achieves the best prediction accuracy and keeps the size of the final ensemble small, more importantly, its pruning time is much less than other ensemble pruning algorithms.  相似文献   

17.
Abstract: Neural network ensembles (sometimes referred to as committees or classifier ensembles) are effective techniques to improve the generalization of a neural network system. Combining a set of neural network classifiers whose error distributions are diverse can generate better results than any single classifier. In this paper, some methods for creating ensembles are reviewed, including the following approaches: methods of selecting diverse training data from the original source data set, constructing different neural network models, selecting ensemble nets from ensemble candidates and combining ensemble members' results. In addition, new results on ensemble combination methods are reported.  相似文献   

18.
路径选择是知识库问答任务的关键步骤,语义相似度常被用来计算路径对于问句的相似度得分。针对测试集中存在大量未见的关系,该文提出使用一种负例动态采样的语义相似度模型的训练方法,去丰富训练集中关系的多样性,模型性能得到显著提升。针对复杂问题候选路径数量组合爆炸问题,该文比较了两种路径剪枝方法,即基于分类的方法和基于集束搜索的方法。在包含简单问题和复杂问题的CCKS 2019-CKBQA评测数据集上,该方法能达到较优异的性能,测试集上单模型系统平均F1值达到0.694,系统融合后达到0.731。  相似文献   

19.
AdaBoost is a highly effective ensemble learning method that combines several weak learners to produce a strong committee with higher accuracy. However, similar to other ensemble methods, AdaBoost uses a large number of base learners to produce the final outcome while addressing high-dimensional data. Thus, it poses a critical challenge in the form of high memory-space consumption. Feature selection methods can significantly reduce dimensionality in regression and have been established to be applicable in ensemble pruning. By pruning the ensemble, it is possible to generate a simpler ensemble with fewer base learners but a higher accuracy. In this article, we propose the minimax concave penalty (MCP) function to prune an AdaBoost ensemble to simplify the model and improve its accuracy simultaneously. The MCP penalty function is compared with LASSO and SCAD in terms of performance in pruning the ensemble. Experiments performed on real datasets demonstrate that MCP-pruning outperforms the other two methods. It can reduce the ensemble size effectively, and generate marginally more accurate predictions than the unpruned AdaBoost model.  相似文献   

20.
主要目的是寻找到一种Bagging的快速修剪方法,以缩小算法占用的存储空间、提高运算的速度和实现提高分类精度的潜力.传统的选择性集成方法研究的重点是基学习器之间的差异化,从同质化的角度采研究这一问题,提出了一种全新的选择性集成思路.通过选择基学习器集合中的最差者来对Bagging集成进行快速层次修剪,获得了一种学习速度接近Bagging性能在其基础上得到提高的新算法.新算法的训练时间明显小于GASEN而性能与其相近.该算法同时还保留了与Bagging相同的并行处理能力.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号