面向新类型人名识别的数据增强方法 Data Augmentation Method for New Type Person Named Entity Recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向新类型人名识别的数据增强方法

引用本文：	宋希良,韩先培,孙乐.面向新类型人名识别的数据增强方法[J].中文信息学报,2019,33(6):72-79.

作者姓名：	宋希良韩先培孙乐

作者单位：	1.中国科学院软件研究所中文信息处理实验室,北京 100190; 2.中国科学院大学,北京 100049

基金项目：	国家自然科学基金(61433015, 61572477, 61772505);中国科协青年人才托举工程(YESS20160177)

摘要：	人名识别常被作为命名实体识别任务的一部分,与其他类型的实体同时进行识别。当前使用NER方法的人名识别依赖于训练语料对特定类型人名的覆盖,在遇到新类型人名时性能显著下降。针对上述问题,该文提出了一种基于数据增强(data augmentation)的方法,使用新类型人名实体替换的策略来生成伪训练数据,该方法能够有效提升系统对新类型人名的识别性能。为了选择有代表性的特定类型人名实体,该文提出了贪心的代表性子类型人名选择算法。在使用1998年《人民日报》数据自动生成的伪测试数据和人工标注的新闻数据的测试结果中,多个模型上人名识别的F1值分别提升了至少12个百分点和6个百分点。
关键词：	人名识别 DATA Augmentation 新类型人名
Data Augmentation Method for New Type Person Named Entity Recognition

SONG Xiliang,HAN Xianpei,SUN Le.Data Augmentation Method for New Type Person Named Entity Recognition[J].Journal of Chinese Information Processing,2019,33(6):72-79.

Authors:	SONG Xiliang HAN Xianpei SUN Le

Affiliation:	1.Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; 2.University of Chinese Academy of Sciences, Beijing 100049, China

Abstract:	Person name recognition tasks are often performed as part of the named entity recognition (NER) tasks, along with other types of entities. Currently, person name recognition method relies on the coverage of the training corpus for a particular type of person name, and the performance is significantly degraded when a new type of person name is encountered. To address this issue, we propose a method namesd Data Augmentation. In this method, we generate pseudo training data by replacing the common person name entities in training data with new specific types of entities. This method can effectively improve the recognition performance of the system for new types of person names. We propose a greedy representative subtype name selection algorithm which can select typical person name of a specific type. We conduct experiments on two test data sets: one is pseudo test data set based on the People's Daily data in 1998 and the other is manually labeled news data. The F₁ measure of the recognition result is increased by at least 12% and 6%, respectively.

Keywords:	person name recognition data augmentation new type of person name
本文献已被维普等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏