首页 | 本学科首页   官方微博 | 高级检索  
     

基于Tri-training半监督学习的中文组织机构名识别*
引用本文:蔡月红,朱倩,程显毅a.基于Tri-training半监督学习的中文组织机构名识别*[J].计算机应用研究,2010,27(1):193-195.
作者姓名:蔡月红  朱倩  程显毅a
作者单位:1. 江苏大学计算机科学与通信工程学院,江苏,镇江,212013;江苏大学外语学习中心,江苏,镇江,212013
2. 江苏大学计算机科学与通信工程学院,江苏,镇江,212013
基金项目:国家自然科学基金资助项目(60702056)
摘    要:针对中文组织机构名识别中的标注语料匮乏问题,提出了一种基于协同训练机制的组织机构名识别方法。该算法利用Tri-training学习方式将基于条件随机场的分类器、基于支持向量机的分类器和基于记忆学习方法的分类器组合成一个分类体系,并依据最优效用选择策略进行新加入样本的选择。在大规模真实语料上与co-training方法进行了比较实验,实验结果表明,此方法能有效利用大量未标注语料提高算法的泛化能力。

关 键 词:中文组织机构名    半监督学习    协同训练    Tri-training

Chinese organization names recognition with Tri-training learning
CAI Yue-hong,ZHU Qian,CHENG Xian-yia.Chinese organization names recognition with Tri-training learning[J].Application Research of Computers,2010,27(1):193-195.
Authors:CAI Yue-hong  ZHU Qian  CHENG Xian-yia
Affiliation:a.School of Computer Science & Communication Engineering/a>;b.Foreign Language Learning Center/a>;Jiangsu University/a>;Zhengjiang Jiangsu 212013/a>;China
Abstract:In view of the data scarcity problem in for Chinese organization names recognition, this paper presented a co-training style method for Organization Names Recognition. And proposed a novel selection method for Tri-training learning, using three classifiers: CRFs, SVMs and MBL. In Tri-training process, selected new newly labeled samples based on the selection model maximizing training utility, and computed the agreement according to the agreement scoring function. Experiments on large-scale corpus show that the proposed Tri-training learning approach can more effectively and stably exploit unlabeled data to improve the generalization ability than co-training and the standard Tri-training.
Keywords:Chinese organization name recognition  semi-supervised learning  co-training  Tri-training
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号