首页 | 本学科首页   官方微博 | 高级检索  
     

基于自适应数据剪辑策略的Tri-training算法
引用本文:邓超,郭茂祖.基于自适应数据剪辑策略的Tri-training算法[J].计算机学报,2007,30(8):1213-1226.
作者姓名:邓超  郭茂祖
作者单位:哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001
基金项目:国家自然科学基金 , 黑龙江省杰出青年科学基金 , 黑龙江省留学回国人员科技项目 , 哈尔滨工业大学校科研和教改项目
摘    要:Tri-training能有效利用无标记样例提高泛化能力.针对Tri-training迭代中无标记样例常被错误标记而形成训练集噪声,导致性能不稳定的缺点,文中提出ADE-Tri-training(Tri-training with Adaptive Data Editing)新算法.它不仅利用RemoveOnly剪辑操作对每次迭代可能产生的误标记样例识别并移除,更重要的是采用自适应策略来确定RemoveOnly触发与抑制的恰当时机.文中证明,PAC理论下自适应策略中一系列判别充分条件可同时确保新训练集规模迭代增大和新假设分类错误率迭代降低更多.UCI数据集上实验结果表明:ADE-Tri-training具有更好的分类泛化性能和健壮性.

关 键 词:半监督学习  数据剪辑  自适应策略  PAC可学习  Tri-training  自适应  数据集  剪辑  适应策略  算法  Data  健壮性  泛化性能  分类错误率  结果  实验  假设  规模  充分条件  判别  理论  时机  触发  移除  识别
修稿时间:2007-03-04

ADE-Tri-training:Tri-training with Adaptive Data Editing
DENG Chao,GUO Mao-Zu.ADE-Tri-training:Tri-training with Adaptive Data Editing[J].Chinese Journal of Computers,2007,30(8):1213-1226.
Authors:DENG Chao  GUO Mao-Zu
Affiliation:School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001
Abstract:Tri-training, a Co-training style semi-supervised learning algorithm, can effectively exploit unlabeled examples to improve generalization ability. However, Tri-training may suffer more from the common problem in semi-supervised learning, i.e. the performance is usually not stable due to the unlabeled examples may often be wrongly labeled and accumulated during the iterative learning process. In this paper a new Tri-training style algorithm named ADE-Tri-training (Tri-training with Adaptive Data Editing) is proposed. ADE-Tri-training not only employs a specific Data Editing technique to identify and discard possible mislabeled examples along with iterations of three classifiers mutually labeling, but also takes an adaptive strategy to trigger or inhibit the editing operation according to different situation. The adaptive strategy is combinations of five precondition theorems all that will ensure reducing classification error as well as increasing the scale of new training set iteratively under the PAC theory. This paper also provides the proof of all these precondition theorems. Experiments on UCI datasets show that ADE-Tri-training could more effectively and stably utilize the unlabeled examples to improve classification generalization than Tri-training and DE-Tri-training (Tri-training with Data Editing but without adaptive strategy).
Keywords:semi-supervised learning  data editing  adaptive strategy  PAC learning  Tri-training
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号