首页 | 本学科首页   官方微博 | 高级检索  
     

选取最大可能预测错误样例的主动学习算法
引用本文:龙军,殷建平,祝恩,蔡志平.选取最大可能预测错误样例的主动学习算法[J].计算机研究与发展,2008,45(3):472-478.
作者姓名:龙军  殷建平  祝恩  蔡志平
作者单位:国防科学技术大学计算机学院,长沙,410073
基金项目:国家自然科学基金 , 湖南省自然科学基金
摘    要:通过选取并提交专家标注最有信息量的样例,主动学习算法中可以有效地减轻标注大量未标注样例的负担.采样是主动学习算法中一个影响性能的关键因素.当前主流的采样算法往往考虑选取的样例尽可能平分版本空间.但这一方法假定版本空间中的每一假设都具有相同的概率成为目标函数,而这在真实世界问题中不可能满足.分析了平分版本策略的局限性.进而提出一种旨在尽可能最大限度减小版本空间的启发式采样算法MPWPS(the most possibly wrong-predicted sampling),该算法每次采样时选取当前分类器最有可能预测错误的样例,从而淘汰版本空间中多于半数的假设.这种方法使分类器在达到相同的分类正确率时,采样次数比当前主流的针对平分版本空间的主动学习算法采样次数更少.实验表明,在大多数数据集上,当达到相同的目标正确率时,MPWPS方法能够比传统的采样算法采样次数更少.

关 键 词:主动学习  采样  版本空间  半分模型  样本复杂度
修稿时间:2006年10月30

An Active Learning Algorithm by Selecting the Most Possibly Wrong-Predicted Instances
Long Jun,Yin Jianping,Zhu En,Cai Zhiping.An Active Learning Algorithm by Selecting the Most Possibly Wrong-Predicted Instances[J].Journal of Computer Research and Development,2008,45(3):472-478.
Authors:Long Jun  Yin Jianping  Zhu En  Cai Zhiping
Abstract:Active learning methods can alleviate the efforts of labeling large amounts of instances by selecting and asking experts to label only the most informative examples.Sampling is a key factor influencing the performance of active learning.Currently,the leading methods of sampling generally choose the instance or instances that can reduce the version space by half.However,the strategy of halving the version space assumes each hypothesis in version space has equal probability to be the target function which can not be satisfied in real world problems.In this paper,the limitation of the strategy of halving the version space is analyzed.Then presented is a sampling method named MPWPS(the most possibly wrong-predicted sampling)aiming to reduce the version space more than half.While sampling,MPWPS chooses the instance or instances that would be most likely to be predicted wrong by the current classifier,so that more than half of hypotheses in the version space are eliminated.Comparing the proposed MPWPS method and the existing active learning methods,when the classifiers achieve the same accuracy,the former method will sample fewer times than the latter ones.The experiments show that the MPWPS method samples fewer instances than traditional sampling methods on most datasets when obtaining the same target accuracy.
Keywords:active learning  sampling  version space  halving model  sample complex
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号