基于主动学习的中文依存句法分析 Active Learning for Chinese Dependency Parsing期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于主动学习的中文依存句法分析

引用本文：	车万翔,张梅山,刘挺.基于主动学习的中文依存句法分析[J].中文信息学报,2012,26(2):18-23.

作者姓名：	车万翔张梅山刘挺

作者单位：	哈尔滨工业大学计算机学院社会计算与信息检索研究中心,黑龙江哈尔滨 150001

基金项目：	国家自然科学基金重点项目(61133012);国家自然科学基金资助项目(60803093);国家863重大项目(2011AA01A207);核高基重大专项(2011ZX01042-001-001);哈尔滨工业大学科研创新基金(HIT.NSRIF.2009069);中央高校基本科研业务费专项资金(HIT.KLOF.2010064)

摘要：	目前依存句法分析仍主要采用有指导的机器学习方法,即需要大规模高质量的树库作为训练语料,而现阶段中文依存树库资源相对较少,树库标注又是一件费时费力的工作。面对大量未标注语料,该文将主动学习应用到中文依存句法分析,优先选择句法模型预测不准的实例交由人工标注。该文提出并比较了多种衡量依存句法模型预测可信度的准则。实验表明,一方面,与随机选择标注实例相比,当使用相同数目训练实例时,主动学习使中文依存分析性能最高提升0.8%;另一方面,主动学习使依存分析达到相同准确率时只需标注更少量实例,人工标注量最多可减少30%。
关键词：	主动学习依存句法不确定性度量委员会投票
Active Learning for Chinese Dependency Parsing

CHE Wanxiang , ZHANG Meishan , LIU Ting.Active Learning for Chinese Dependency Parsing[J].Journal of Chinese Information Processing,2012,26(2):18-23.

Authors:	CHE Wanxiang ZHANG Meishan LIU Ting

Affiliation:	Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China

Abstract:	It is necessary to have a large annotated Treebank to build a statistical dependency parser.Acquisition of such a Treebank is time consuming,tedious and expensive.This paper presents a method to reduce this demand via active learning,which selects the most uncertain samples for annotation instead of the whole training corpus.Experiments are carried out on the HIT-CIR-CDT,our results show that the parsing accuracy rises about 0.8 percent by active learning when using the same amount of training samples.In other words,for about the same parsing accuracy,we only need to annotate 70% of the samples as compared to the usual random selection method.

Keywords:	active learning dependency parsing uncertainty-based sampling query-by-committee
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏