首页 | 本学科首页   官方微博 | 高级检索  
     

基于样本条件价值改进的 Co-training 算法
引用本文:程圣军, 刘家锋, 黄庆成, 唐降龙. 基于样本条件价值改进的 Co-training 算法. 自动化学报, 2013, 39(10): 1665-1673. doi: 10.3724/SP.J.1004.2013.01665
作者姓名:程圣军  刘家锋  黄庆成  唐降龙
作者单位:1.哈尔滨工业大学计算机科学与技术学院 哈尔滨 150001
基金项目:国家自然科学基金(61173087, 61073128), 黑龙江省自然科学基金(F201021)资助
摘    要:Co-training是一种主流的半监督学习算法. 该算法中两视图下的分类器通过迭代的方式, 互为对方从无标记样本集中挑选新增样本, 以更新对方训练集. Co-training以分类器的后验概率输出作为新增样本的挑选策略, 该策略忽略了样本对于当前分类器的价值. 针对该问题, 本文提出一种改进的Co-training式算法—CVCOT (Conditional value-based co-training), 即采用基于样本条件价值的挑选策略来优化Co-training. 通过定义无标记样本的条件价值, 各视图下的分类器以样本条件价值为依据来挑选新增样本, 以此更新训练集. 该策略既可保证新增样本的标记可靠性, 又能优先将价值较高的富信息样本补充到训练集中, 可以有效地优化分类器. 在UCI数据集和网页分类应用上的实验结果表明: CVCOT具有较好的分类性能和学习效率.

关 键 词:机器学习   半监督学习   Co-training   富信息样本   条件价值
收稿时间:2012-05-08
修稿时间:2012-08-02

Conditional Value-based Co-training
CHENG Sheng-Jun, LIU Jia-Feng, HUANG Qing-Cheng, TANG Xiang-Long. Conditional Value-based Co-training. ACTA AUTOMATICA SINICA, 2013, 39(10): 1665-1673. doi: 10.3724/SP.J.1004.2013.01665
Authors:CHENG Sheng-Jun  LIU Jia-Feng  HUANG Qing-Cheng  TANG Xiang-Long
Affiliation:1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001
Abstract:Co-training is one of the major semi-supervised learning methods, which iteratively trains two classifiers under two different views, and uses the predictions of either classifier on the unlabeled examples to augment the training set of the other. In each round of co-training, newly added examples are selected according to the classifier's posteriori probability output, which neglects examples' value with respect to the current classifier. This paper proposes an improved co-training style algorithm, termed as CVCOT (conditional value-based co-training), which employs a conditional value-based strategy for selecting candidate training examples. Specifically, the conditional value of unlabeled examples in the co-training process is defined and computed, then it is utilized by either classifier under different views for augmenting the training set of the other. The new strategy can not only guarantee the reliability of the pseudo-labels, but also tends to add more informative examples with higher values to the training sets. Therefore, the classifier under either view will get refined. Experiments on UCI data sets and application to the web page classification task indicate that the CVCOT achieves better classification performance and learning efficiency.
Keywords:Machine learning  semi-supervised learning  co-training  informative example  conditional value
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号