首页 | 本学科首页   官方微博 | 高级检索  
     

Co-training机器学习方法在中文组块识别中的应用
引用本文:刘世岳,李珩,张俐,姚天顺.Co-training机器学习方法在中文组块识别中的应用[J].中文信息学报,2005,19(3):74-80.
作者姓名:刘世岳  李珩  张俐  姚天顺
作者单位:东北大学计算机软件与理论研究所,辽宁沈阳 110004
基金项目:教育部科学技术研究项目,国家自然科学基金委员会-微软亚洲研究院联合资助项目
摘    要:采用半指导机器学习方法co2training 实现中文组块识别。首先明确了中文组块的定义,co-training 算法的形式化定义。文中提出了基于一致性的co-training 选取方法将增益的隐马尔可夫模型(Transductive HMM) 和基于转换规则的分类器(fnTBL) 组合成一个分类体系,并与自我训练方法进行了比较,在小规模汉语树库语料和大规模未带标汉语语料上进行中文组块识别,实验结果要比单纯使用小规模的树库语料有所提高,F 值分别达到了85134 %和83141 % ,分别提高了2113 %和7121 %。

关 键 词:计算机应用  中文信息处理  co2training  算法  中文组块  分类器  
文章编号:1003-0077(2005)03-0073-07
修稿时间:2004年7月25日

Chinese Text Chunking Using Co-training Method
LIU Shi-yue,LI Heng,ZHANG Li,YAO Tian-shun.Chinese Text Chunking Using Co-training Method[J].Journal of Chinese Information Processing,2005,19(3):74-80.
Authors:LIU Shi-yue  LI Heng  ZHANG Li  YAO Tian-shun
Affiliation:Institute of computer software &throey ,Shenyang ,Liaoning 110004 , China
Abstract:In this paper we discuss the application of semi-supervised machine learning method-co-training on Chinese Text Chunking. Firstly, we give the definition of Chinese chunk,then the formalized definition of co-training algorithm.We proposed a example selection method based on the consistence, using two classifiers : Transductive HMM and fnTBL to combine a classification system to perform the Chinese text chunking task with the small-scale labled Chinese treebank and large-scale unlabled Chinese corpus. The result were compared with the self-training result and the result of the non co-training experiment in which we only used the small-scale Chinese treebank as training data and use one classifier(Transductive HMM or fnTBL) to recognize the Chinese chunk. The improvement is significant, the F value of the two classifiers reached 83.41%,85.34%, get a improvement of 2.13 points and 7.21 points respectively.
Keywords:computer application  Chinese information processing  co-training algorithm  Chinese chunk  classifier  
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号