选取最大可能预测错误样例的主动学习算法 An Active Learning Algorithm by Selecting the Most Possibly Wrong-Predicted Instances期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

选取最大可能预测错误样例的主动学习算法

引用本文：	龙军,殷建平,祝恩,蔡志平.选取最大可能预测错误样例的主动学习算法[J].计算机研究与发展,2008,45(3):472-478.

作者姓名：	龙军殷建平祝恩蔡志平

作者单位：	国防科学技术大学计算机学院,长沙,410073

基金项目：	国家自然科学基金 , 湖南省自然科学基金

摘要：	通过选取并提交专家标注最有信息量的样例,主动学习算法中可以有效地减轻标注大量未标注样例的负担.采样是主动学习算法中一个影响性能的关键因素.当前主流的采样算法往往考虑选取的样例尽可能平分版本空间.但这一方法假定版本空间中的每一假设都具有相同的概率成为目标函数,而这在真实世界问题中不可能满足.分析了平分版本策略的局限性.进而提出一种旨在尽可能最大限度减小版本空间的启发式采样算法MPWPS(the most possibly wrong-predicted sampling),该算法每次采样时选取当前分类器最有可能预测错误的样例,从而淘汰版本空间中多于半数的假设.这种方法使分类器在达到相同的分类正确率时,采样次数比当前主流的针对平分版本空间的主动学习算法采样次数更少.实验表明,在大多数数据集上,当达到相同的目标正确率时,MPWPS方法能够比传统的采样算法采样次数更少.
关键词：	主动学习采样版本空间半分模型样本复杂度
修稿时间：	2006年10月30
An Active Learning Algorithm by Selecting the Most Possibly Wrong-Predicted Instances

Long Jun,Yin Jianping,Zhu En,Cai Zhiping.An Active Learning Algorithm by Selecting the Most Possibly Wrong-Predicted Instances[J].Journal of Computer Research and Development,2008,45(3):472-478.

Authors:	Long Jun Yin Jianping Zhu En Cai Zhiping

Abstract:	Active learning methods can alleviate the efforts of labeling large amounts of instances by selecting and asking experts to label only the most informative examples.Sampling is a key factor influencing the performance of active learning.Currently,the leading methods of sampling generally choose the instance or instances that can reduce the version space by half.However,the strategy of halving the version space assumes each hypothesis in version space has equal probability to be the target function which can not be satisfied in real world problems.In this paper,the limitation of the strategy of halving the version space is analyzed.Then presented is a sampling method named MPWPS(the most possibly wrong-predicted sampling)aiming to reduce the version space more than half.While sampling,MPWPS chooses the instance or instances that would be most likely to be predicted wrong by the current classifier,so that more than half of hypotheses in the version space are eliminated.Comparing the proposed MPWPS method and the existing active learning methods,when the classifiers achieve the same accuracy,the former method will sample fewer times than the latter ones.The experiments show that the MPWPS method samples fewer instances than traditional sampling methods on most datasets when obtaining the same target accuracy.

Keywords:	active learning sampling version space halving model sample complex
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏