基于置信度的藏文人名识别的主动学习模型研究 Confidence Based Active Learning Model for Tibetan Person Name Recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于置信度的藏文人名识别的主动学习模型研究

引用本文：	王志娟,刘飞飞,赵小兵,宋伟.基于置信度的藏文人名识别的主动学习模型研究[J].中文信息学报,2019,33(8):53-59.

作者姓名：	王志娟刘飞飞赵小兵宋伟

作者单位：	1.中央民族大学信息工程学院,北京 100081; 2.国家语言资源监测与研究少数民族语言中心,北京 100081; 3.好未来教育科技集团,北京 100080

基金项目：	国家自然科学基金(61331013,61501529)

摘要：	训练语料的标注成本是资源稀缺语言处理研究面临的一个重要问题,通过主动学习(active learning)方法可以选择信息量大、无冗余的语料供人工标注,进而大大降低语料标注成本。该文基于CRF模型给出的标注置信度提出了四种主动学习方法,并通过实验确定了这四种主动学习方法的相关参数。实验显示:选择置信度低于0.7的语料进行人工标注,直到新旧模型标注结果的差异度小于0.01%时,仅需6轮迭代;人工标注3.2MB的语料,藏文人名识别的F值可以达到88%,若要达到该识别效果,基于CRF的监督式学习模型需要标注约10MB的语料,该主动学习方法降低了约66%的语料标注规模。
关键词：	藏文人名识别主动学习置信度
Confidence Based Active Learning Model for Tibetan Person Name Recognition

WANG Zhijuan,LIU Feifei,ZHAO Xiaobing,SONG Wei.Confidence Based Active Learning Model for Tibetan Person Name Recognition[J].Journal of Chinese Information Processing,2019,33(8):53-59.

Authors:	WANG Zhijuan LIU Feifei ZHAO Xiaobing SONG Wei

Affiliation:	1.School of Electronics Engineering, Minzu University of China, Beijing 100081, China; 2.National Language Resource Monitoring & Research Center of Minority Languages, Beijing 100081, China; 3.Tomorrow Advancing Life Education Group, Beijing 100080, China

Abstract:	To alleviate the issue of labeling cost of training data for low resource languages, the active learning is a promising method by selecting the informative data without redundancy. Four active learning methods based on the confidence are proposed, with the parameters decided empirically. The experimental results: selecting the data with confidence below 0.7 and 6 iteration of labeling with up to 3.2MB training data, we can achieve 0.88 F-measure for Tibetan name recognition. Compare with the 10MB training data for CRF model to achieve the same performance (with no more than 0.01% difference), the active learning approach reduces the annotation scale by 66%.

Keywords:	Tibetan person name recognition active learning confidence
本文献已被维普等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏