首页 | 本学科首页   官方微博 | 高级检索  
     

结合主动学习与置信度投票的集成自训练方法
引用本文:黎隽男,吕 佳.结合主动学习与置信度投票的集成自训练方法[J].计算机工程与应用,2016,52(20):167-171.
作者姓名:黎隽男  吕 佳
作者单位:重庆师范大学 计算机与信息科学学院,重庆 401331
摘    要:基于集成学习的自训练算法是一种半监督算法,不少学者通过集成分类器类别投票或平均置信度的方法选择可靠样本。基于置信度的投票策略倾向选择置信度高的样本或置信度低但投票却一致的样本进行标记,后者这种情形可能会误标记靠近决策边界的样本,而采用异构集成分类器也可能会导致各基分类器对高置信度样本的类别标记不同,从而无法将其有效加入到有标记样本集。提出了结合主动学习与置信度投票策略的集成自训练算法用来解决上述问题。该算法合理调整了投票策略,选择置信度高且投票一致的无标记样本加以标注,同时利用主动学习对投票不一致而置信度较低的样本进行人工标注,以弥补集成自训练学习只关注置信度高的样本,而忽略了置信度低的样本的有用信息的缺陷。在UCI数据集上的对比实验验证了该算法的有效性。

关 键 词:集成自训练算法  主动学习  加权K最近邻(KNN)  朴素贝叶斯  置信度  

Ensemble self-training method based on active learning and confidence voting
LI Junnan,LV Jia.Ensemble self-training method based on active learning and confidence voting[J].Computer Engineering and Applications,2016,52(20):167-171.
Authors:LI Junnan  LV Jia
Affiliation:College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
Abstract:The self-training algorithm based on ensemble learning is a semi-supervised algorithm. Many scholars choose reliable samples by vote or average confidence of ensemble classifiers. Voting strategies based on confidence tend to choose a sample with high confidence or low confidence but unanimous vote of ensemble classifiers. The latter scenario may mistakenly label a sample near the decision boundary. If the heterogeneous ensemble classifiers are used, it may lead to the problem that a sample of high confidence has different labels labeled by ensemble classifiers. Therefore, unlabeled samples labeled by ensemble classifiers can not be effectively added to the training set. An ensemble self-training algorithm based on active learning and confidence voting is proposed to solve the problems above. The algorithm reasonably adjusts the voting strategy, and labels a unlabelled sample with high confidence and unanimous vote of ensemble classifiers. At the same time, the active learning is used to label samples with low confidence and inconsistent voting of ensemble classifiers, so as to compensate for the shortcoming that the ensemble self-training algorithm focuses only on samples with high confidence, while ignoring useful information of samples of low confidence. The effectiveness of the proposed algorithm is verified by a comparative experiment on the UCI data set.
Keywords:ensemble self-training  active learning  weighted K Nearest Neighbor(KNN)  naive Bayes  confidence  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号