结合主动学习与置信度投票的集成自训练方法 Ensemble self-training method based on active learning and confidence voting期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

结合主动学习与置信度投票的集成自训练方法

引用本文：	黎隽男,吕佳.结合主动学习与置信度投票的集成自训练方法[J].计算机工程与应用,2016,52(20):167-171.

作者姓名：	黎隽男吕佳

作者单位：	重庆师范大学计算机与信息科学学院，重庆 401331

摘要：	基于集成学习的自训练算法是一种半监督算法，不少学者通过集成分类器类别投票或平均置信度的方法选择可靠样本。基于置信度的投票策略倾向选择置信度高的样本或置信度低但投票却一致的样本进行标记，后者这种情形可能会误标记靠近决策边界的样本，而采用异构集成分类器也可能会导致各基分类器对高置信度样本的类别标记不同，从而无法将其有效加入到有标记样本集。提出了结合主动学习与置信度投票策略的集成自训练算法用来解决上述问题。该算法合理调整了投票策略，选择置信度高且投票一致的无标记样本加以标注，同时利用主动学习对投票不一致而置信度较低的样本进行人工标注，以弥补集成自训练学习只关注置信度高的样本，而忽略了置信度低的样本的有用信息的缺陷。在UCI数据集上的对比实验验证了该算法的有效性。
关键词：	集成自训练算法主动学习加权K最近邻（KNN）朴素贝叶斯置信度
Ensemble self-training method based on active learning and confidence voting

LI Junnan,LV Jia.Ensemble self-training method based on active learning and confidence voting[J].Computer Engineering and Applications,2016,52(20):167-171.

Authors:	LI Junnan LV Jia

Affiliation:	College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China

Abstract:	The self-training algorithm based on ensemble learning is a semi-supervised algorithm. Many scholars choose reliable samples by vote or average confidence of ensemble classifiers. Voting strategies based on confidence tend to choose a sample with high confidence or low confidence but unanimous vote of ensemble classifiers. The latter scenario may mistakenly label a sample near the decision boundary. If the heterogeneous ensemble classifiers are used, it may lead to the problem that a sample of high confidence has different labels labeled by ensemble classifiers. Therefore, unlabeled samples labeled by ensemble classifiers can not be effectively added to the training set. An ensemble self-training algorithm based on active learning and confidence voting is proposed to solve the problems above. The algorithm reasonably adjusts the voting strategy, and labels a unlabelled sample with high confidence and unanimous vote of ensemble classifiers. At the same time, the active learning is used to label samples with low confidence and inconsistent voting of ensemble classifiers, so as to compensate for the shortcoming that the ensemble self-training algorithm focuses only on samples with high confidence, while ignoring useful information of samples of low confidence. The effectiveness of the proposed algorithm is verified by a comparative experiment on the UCI data set.

Keywords:	ensemble self-training active learning weighted K Nearest Neighbor（KNN） naive Bayes confidence

	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏