首页 | 本学科首页   官方微博 | 高级检索  
     

基于自动弱标注数据的跨领域命名实体识别
引用本文:方晔玮,王铭涛,陈文亮,张熠天,张民.基于自动弱标注数据的跨领域命名实体识别[J].中文信息学报,2022,36(3):73-81,90.
作者姓名:方晔玮  王铭涛  陈文亮  张熠天  张民
作者单位:1.苏州大学 计算机科学与技术学院,江苏 苏州 215006;
2.国家工业信息安全发展研究中心,北京 100043
基金项目:国家自然科学基金(61525205,61876115)
摘    要:近年来,在大规模标注语料上训练的神经网络模型大大提升了命名实体识别任务的性能.但是,新领域人工标注数据获取代价高昂,如何快速、低成本地进行领域迁移就显得非常重要.在目标领域仅给定无标注数据的情况下,该文尝试自动构建目标领域的弱标注语料并对其建模.首先,采用两种不同的方法对无标注数据进行自动标注;然后,采用留"同"去"异...

关 键 词:命名实体识别  领域适应  局部标注

Cross-Domain NER Using Automatically Partial-annotated Data
FANG Yewei,WANG Mingtao,CHEN Wenliang,ZHANG Yitian,ZHANG Min.Cross-Domain NER Using Automatically Partial-annotated Data[J].Journal of Chinese Information Processing,2022,36(3):73-81,90.
Authors:FANG Yewei  WANG Mingtao  CHEN Wenliang  ZHANG Yitian  ZHANG Min
Affiliation:1.School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China;
2.National Industrial Information Security Development Research Center, Beijing 100043, China
Abstract:In recent years, neural network models trained on a large number of labeled samples boost the performance of named entity recognition. However, collecting enough labeled samples in various domains is very expensive, which reveals the importance of rapid domain transferring. Given only unlabeled data of a target domain, this paper attempts to automatically construct corpora with partial annotation in the target domain and model it. Firstly, we use two different methods to annotate unlabeled data automatically. Then, we keep consistent annotations while removing those with different annotations, which reduces erroneous annotations as much as possible and generates a partial-annotated corpus. Finally, we propose a new entity recognition model based on partial annotation learning. Experiment results of transferring from the news domain to the social-media domain as well as finance domain prove that the proposed approach effectively improves the domain adaptation performance of named entity recognition at a low cost. With the addition of pre-trained language model BERT, this method still exhibits good performance.
Keywords:named entity recognition  domain adaptation  partial annotation  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号