首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Voting over Multiple Condensed Nearest Neighbors   总被引:4,自引:0,他引:4  
  相似文献   

2.
针对大型支持向量机(SVM)经随机投影特征降维后分类精度下降的问题,结合对偶恢复理论,提出了面向大规模分类问题的基于对偶随机投影的线性核支持向量机(drp-LSVM)。首先,分析论证了drp-LSVM相关几何性质,证明了在保持与基于随机投影降维的支持向量机(rp-LSVM)相近几何优势的同时,其划分超平面更接近于用全部数据训练得到的原始分类器。然后,针对提出的drp-LSVM快速求解问题,改进了传统的序列最小优化(SMO)算法,设计了基于改进SMO算法的drp-LSVM分类器。最后实验结果表明,drp-LSVM在继承rp-LSVM优点的同时,减小了分类误差,提高了训练精度,并且各项性能评价更接近于用原始数据训练得到的分类器;设计的基于改进SMO算法的分类器不但可以减少内存消耗,同时可以拥有较高的训练精度。  相似文献   

3.
: A robust character of combining diverse classifiers using a majority voting has recently been illustrated in the pattern recognition literature. Furthermore, negatively correlated classifiers turned out to offer further improvement of the majority voting performance even comparing to the idealised model with independent classifiers. However, negatively correlated classifiers represent a very unlikely situation in real-world classification problems, and their benefits usually remain out of reach. Nevertheless, it is theoretically possible to obtain a 0% majority voting error using a finite number of classifiers at error levels lower than 50%. We attempt to show that structuring classifiers into relevant multistage organisations can widen this boundary, as well as the limits of majority voting error, even more. Introducing discrete error distributions for analysis, we show how majority voting errors and their limits depend upon the parameters of a multiple classifier system with hardened binary outputs (correct/incorrect). Moreover, we investigate the sensitivity of boundary distributions of classifier outputs to small discrepancies modelled by the random changes of votes, and propose new more stable patterns of boundary distributions. Finally, we show how organising classifiers into different structures can be used to widen the limits of majority voting errors, and how this phenomenon can be effectively exploited. Received: 17 November 2000, Received in revised form: 27 November 2001, Accepted: 29 November 2001 ID="A1" Correspondence and offprint requests to: D. Ruta, Applied Computing Research Unit, Division of Computer and Information Systems, University of Paisley, High Street, Paisley PA1 2BE, UK. Email: ruta-ci0@paisley.ac.uk  相似文献   

4.
基于集成分类算法的自动图像标注   总被引:2,自引:0,他引:2  
蒋黎星  侯进 《自动化学报》2012,38(8):1257-1262
基于语义的图像检索技术中,按照图像的语义进行自动标注是一个具有挑战性的工作. 本文把图像的自动标注过程转化为图像分类的过程,通过有监督学习对每个图像区域分类并得到相应关键字,实现标注. 采用一种快速随机森林(Fast random forest, FRF)集成分类算法,它可以对大量的训练数据进行有效的分类和标注. 在基于Corel数据集的实验中,相比经典算法, FRF改善了运算速度,并且分类精度保持稳定. 在图像标注方面有很好的应用.  相似文献   

5.
Nearest neighbor (NN) classifier is the most popular non-parametric classifier. It is a simple classifier with no design phase and shows good performance. Important factors affecting the efficiency and performance of NN classifier are (i) memory required to store the training set, (ii) classification time required to search the nearest neighbor of a given test pattern, and (iii) due to the curse of dimensionality the number of training patterns needed by it to achieve a given classification accuracy becomes prohibitively large when the dimensionality of the data is high. In this paper, we propose novel techniques to improve the performance of NN classifier and at the same time to reduce its computational burden. These techniques are broadly based on: (i) overlap based pattern synthesis which can generate a larger number of artificial patterns than the number of input patterns and thus can reduce the curse of dimensionality effect, (ii) a compact representation of the given set of training patterns called overlap pattern graph (OLP-graph) which can be incrementally built by scanning the training set only once and (iii) an efficient NN classifier called OLP-NNC which directly works with OLP-graph and does implicit overlap based pattern synthesis. A comparison based on experimental results is given between some of the relevant classifiers. The proposed schemes are suitable for applications dealing with large and high dimensional datasets like those in data mining.  相似文献   

6.
将集成学习的思想引入到增量学习之中可以显著提升学习效果,近年关于集成式增量学习的研究大多采用加权投票的方式将多个同质分类器进行结合,并没有很好地解决增量学习中的稳定-可塑性难题。针对此提出了一种异构分类器集成增量学习算法。该算法在训练过程中,为使模型更具稳定性,用新数据训练多个基分类器加入到异构的集成模型之中,同时采用局部敏感哈希表保存数据梗概以备待测样本近邻的查找;为了适应不断变化的数据,还会用新获得的数据更新集成模型中基分类器的投票权重;对待测样本进行类别预测时,以局部敏感哈希表中与待测样本相似的数据作为桥梁,计算基分类器针对该待测样本的动态权重,结合多个基分类器的投票权重和动态权重判定待测样本所属类别。通过对比实验,证明了该增量算法有比较高的稳定性和泛化能力。  相似文献   

7.
We present attribute bagging (AB), a technique for improving the accuracy and stability of classifier ensembles induced using random subsets of features. AB is a wrapper method that can be used with any learning algorithm. It establishes an appropriate attribute subset size and then randomly selects subsets of features, creating projections of the training set on which the ensemble classifiers are built. The induced classifiers are then used for voting. This article compares the performance of our AB method with bagging and other algorithms on a hand-pose recognition dataset. It is shown that AB gives consistently better results than bagging, both in accuracy and stability. The performance of ensemble voting in bagging and the AB method as a function of the attribute subset size and the number of voters for both weighted and unweighted voting is tested and discussed. We also demonstrate that ranking the attribute subsets by their classification accuracy and voting using only the best subsets further improves the resulting performance of the ensemble.  相似文献   

8.
为解决垃圾网页检测过程中的不平衡分类和"维数灾难"问题,提出一种基于随机森林(RF)和欠采样集成的二元分类器算法。首先使用欠采样技术将训练样本集大类抽样成多个子样本集,再将其分别与小类样本集合并构成多个平衡的子训练样本集;然后基于各个子训练样本集训练出多个随机森林分类器;最后用多个随机森林分类器对测试样本集进行分类,采用投票法确定测试样本的最终所属类别。在WEBSPAM UK-2006数据集上的实验表明,该集成分类器算法应用于垃圾网页检测比随机森林算法及其Bagging和Adaboost集成分类器算法效果更好,准确率、F1测度、ROC曲线下面积(AUC)等指标提高至少14%,13%和11%。与Web spam challenge 2007 优胜团队的竞赛结果相比,该集成分类器算法在F1测度上提高至少1%,在AUC上达到最优结果。  相似文献   

9.
师彦文  王宏杰 《计算机科学》2017,44(Z11):98-101
针对不平衡数据集的有效分类问题,提出一种结合代价敏感学习和随机森林算法的分类器。首先提出了一种新型不纯度度量,该度量不仅考虑了决策树的总代价,还考虑了同一节点对于不同样本的代价差异;其次,执行随机森林算法,对数据集作K次抽样,构建K个基础分类器;然后,基于提出的不纯度度量,通过分类回归树(CART)算法来构建决策树,从而形成决策树森林;最后,随机森林通过投票机制做出数据分类决策。在UCI数据库上进行实验,与传统随机森林和现有的代价敏感随机森林分类器相比,该分类器在分类精度、AUC面积和Kappa系数这3种性能度量上都具有良好的表现。  相似文献   

10.
针对单个分类器方法在滚动轴承故障诊断中精度较低、故障样本标记稀缺、特征空间维度高等问题,提出一种将协同训练与集成学习相结合的Co-Forest轴承故障诊断算法。Co-Forest是半监督学习中的协同训练算法,包含多个基分类器,通过投票实现协同训练中的置信度估算。从滚动轴承的振动信号中提取时域、频域特征指标。利用少量带标签和大量未标记样本重复地训练基分类器。集成基分类器,实现对滚动轴承故障的诊断。实验结果表明,与同类型的协同训练算法(Co-Training、Tri-Training)相比,Co-Forest算法在轴承故障诊断中具有更高的正确率,与当前针对特征向量高维、标记样本稀缺问题的ISS-LPP算法,SS-LLTSA算法相比,Co-Forest算法在保持很高诊断正确率的情况下,不需要降维、参数设置简单,具有一定的实际应用价值。  相似文献   

11.
为解决垃圾网页检测过程中的“维数灾难”和不平衡分类问题,提出一种基于免疫克隆特征选择和欠采样(US)集成的二元分类器算法。首先,使用欠采样技术将训练样本集大类抽样成多个与小类样本数相近的样本集,再将其分别与小类样本合并构成多个平衡的子训练样本集;然后,设计一种免疫克隆算法遴选出多个最优的特征子集;基于最优特征子集对平衡的子样本集进行投影操作,生成平衡数据集的多个视图;最后,用随机森林(RF)分类器对测试样本进行分类,采用简单投票法确定测试样本的最终类别。在WEBSPAM UK-2006数据集上的实验结果表明,该集成分类器算法应用于垃圾网页检测:与随机森林算法及其Bagging和AdaBoost集成分类器算法相比,准确率、F1测度、AUC等指标均提高11%以上;与其他最优的研究结果相比,该集成分类器算法在F1测度上提高2%,在AUC上达到最优。  相似文献   

12.
Bagging, Boosting and the Random Subspace Method for Linear Classifiers   总被引:6,自引:0,他引:6  
Recently bagging, boosting and the random subspace method have become popular combining techniques for improving weak classifiers. These techniques are designed for, and usually applied to, decision trees. In this paper, in contrast to a common opinion, we demonstrate that they may also be useful in linear discriminant analysis. Simulation studies, carried out for several artificial and real data sets, show that the performance of the combining techniques is strongly affected by the small sample size properties of the base classifier: boosting is useful for large training sample sizes, while bagging and the random subspace method are useful for critical training sample sizes. Finally, a table describing the possible usefulness of the combining techniques for linear classifiers is presented. Received: 03 November 2000, Received in revised form: 02 November 2001, Accepted: 13 December 2001  相似文献   

13.
一种协同半监督分类算法Co-S3OM   总被引:1,自引:0,他引:1  
为了提高半监督分类的有效性, 提出了一种基于SOM神经网络和协同训练的半监督分类算法Co-S3OM (coordination semi-supervised SOM)。将有限的有标记样本分为无重复的三个均等的训练集, 分别使用改进的监督SSOM算法(supervised SOM)训练三个单分类器, 通过三个单分类器共同投票的方法挖掘未标记样本中的隐含信息, 扩大有标记样本的数量, 依次扩充单分类器训练集, 生成最终的分类器。最后选取UCI数据集进行实验, 结果表明Co-S3OM具有较高的标记率和分类率。  相似文献   

14.
针对传统行为识别方法存在的数据存储空间不足、识别效率不高以及扩展性不强等问题,本文在利用空间中人体关节点数据进行人体行为表示的基础上,通过自建行为数据集结合Spark MLlib算法库的随机森林算法对行为识别进行建模。为了提升识别模型的泛化能力,本文利用Spark平台下算法的并行且快速迭代的特性,提出了一种多重随机森林的加权大数投票算法。实验结果表明,随着基分类器个数的增加,行为分类准确率显著增高,基分类器个数在5个以后行为识别准确率趋于稳定且高达95%以上。在MSR Daily 3D与MSRC-12数据集上也验证本文行为识别方法的有效性。  相似文献   

15.
Various fusion functions for classifier combination have been designed to optimize the results of ensembles of classifiers (EoC). We propose a pairwise fusion matrix (PFM) transformation, which produces reliable probabilities for the use of classifier combination and can be amalgamated with most existent fusion functions for combining classifiers. The PFM requires only crisp class label outputs from classifiers, and is suitable for high-class problems or problems with few training samples. Experimental results suggest that the performance of a PFM can be a notch above that of the simple majority voting rule (MAJ), and a PFM can work on problems where a behavior-knowledge space (BKS) might not be applicable.  相似文献   

16.
We propose and investigate the fuzzy ARTMAP neural network in off and online classification of fluorescence in situ hybridization image signals enabling clinical diagnosis of numerical genetic abnormalities. We evaluate the classification task (detecting a several abnormalities separately or simultaneously), classifier paradigm (monolithic or hierarchical), ordering strategy for the training patterns (averaging or voting), training mode (for one epoch, with validation or until completion) and model sensitivity to parameters. We find the fuzzy ARTMAP accurate in accomplishing both tasks requiring only very few training epochs. Also, selecting a training ordering by voting is more precise than if averaging over orderings. If trained for only one epoch, the fuzzy ARTMAP provides fast, yet stable and accurate learning as well as insensitivity to model complexity. Early stop of training using a validation set reduces the fuzzy ARTMAP complexity as for other machine learning models but cannot improve accuracy beyond that achieved when training is completed. Compared to other machine learning models, the fuzzy ARTMAP does not loose but gain accuracy when overtrained, although increasing its number of categories. Learned incrementally, the fuzzy ARTMAP reaches its ultimate accuracy very fast obtaining most of its data representation capability and accuracy by using only a few examples. Finally, the fuzzy ARTMAP accuracy for this domain is comparable with those of the multilayer perceptron and support vector machine and superior to those of the naive Bayesian and linear classifiers.  相似文献   

17.
为了将高维富模型特征投影与分类器结合,降低隐写图像的检测误差,提出对高维富模型特征分割再结合混合核的特征投影算法的隐写分析方法。将高维特征纵向分解为若干特征块,对每个特征块投影,投影后的特征块拼成新的特征。设计非线性混合核函数代替单核函数进行特征投影,以克服样本规模巨大、多维数据的不规则等现象。投影后的特征用FLD(Fisher Linear Discriminant)集成分类器分类。实验结果表明,该方法进一步降低了隐写图像的检测错误率,同时有效降低了运行内存需求。  相似文献   

18.
结合目标预测位置的压缩跟踪   总被引:1,自引:0,他引:1       下载免费PDF全文
目的:提出结合目标预测位置的压缩跟踪算法用于提高目标跟踪的准确度。方法:选择随机间距稀疏Toeplitz矩阵作为投影矩阵,对原始多尺度Haar-like特征进行压缩;然后,将样本与Mean Shift算法框架下的预测位置的距离权重输入Bayes分类器,形成分类背景与目标的判别函数;最后对参数的更新方式进行优化,提出了参数自适应的学习模式。结果:与目前较流行的6种目标跟踪算法在20个具有挑战性的序列中进行比较,实验结果表明本文提出的算法平均跟踪成功率比压缩跟踪算法将近高27%,平均运行时间为0.15秒/帧。结论:本文采用了结合预测位置的压缩跟踪算法,在参数更新阶段采用了非线性参数学习模式,实验表明结合目标预测位置的跟踪算法比一般的跟踪算法更具有鲁棒性,更能适应遮挡等情况,跟踪的效果也更加平滑。  相似文献   

19.
Stable orthogonal local discriminant embedding (SOLDE) is a recently proposed dimensionality reduction method, in which the similarity, diversity and interclass separability of the data samples are well utilized to obtain a set of orthogonal projection vectors. By combining multiple features of data, it outperforms many prevalent dimensionality reduction methods. However, the orthogonal projection vectors are obtained by a step-by-step procedure, which makes it computationally expensive. By generalizing the objective function of the SOLDE to a trace ratio problem, we propose a stable and orthogonal local discriminant embedding using trace ratio criterion (SOLDE-TR) for dimensionality reduction. An iterative procedure is provided to solve the trace ratio problem, due to which the SOLDE-TR method is always faster than the SOLDE. The projection vectors of the SOLDE-TR will always converge to a global solution, and the performances are always better than that of the SOLDE. Experimental results on two public image databases demonstrate the effectiveness and advantages of the proposed method.  相似文献   

20.
Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learning-theoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a component-wise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsity-promoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号