首页 | 本学科首页   官方微博 | 高级检索  
     

中文嵌套命名实体识别语料库的构建
引用本文:李雁群,何云琪,钱龙华,周国栋.中文嵌套命名实体识别语料库的构建[J].中文信息学报,2018,32(8):19-26.
作者姓名:李雁群  何云琪  钱龙华  周国栋
作者单位:1.苏州大学 自然语言处理实验室,江苏 苏州 215006;
2.苏州大学 计算机科学与技术学院,江苏 苏州 215006
基金项目:国家自然科学基金(61373096,61331011,61673290)
摘    要:嵌套命名实体含有丰富的实体和实体间语义关系,有助于提高信息抽取的效率。由于缺少统一的标准中文嵌套命名实体语料库,目前中文嵌套命名实体的研究工作难于比较。该文在已有命名实体语料的基础上采用半自动化方法构建了两个中文嵌套命名实体语料库。首先利用已有中文命名实体语料库中的标注信息自动地构造出尽可能多的嵌套命名实体,然后再进行手工调整以满足对中文嵌套实体的标注要求,从而构建高质量的中文嵌套命名实体识别语料库。语料内和跨语料嵌套实体识别的初步实验表明,中文嵌套命名实体识别仍是一个比较困难的问题,需要进一步研究。

关 键 词:中文嵌套命名实体识别  条件随机场  信息抽取  语料库  

Chinese Nested Named Entity Recognition Corpus Construction
LI Yanqun,HE Yunqi,QIAN Longhua,ZHOU Guodong.Chinese Nested Named Entity Recognition Corpus Construction[J].Journal of Chinese Information Processing,2018,32(8):19-26.
Authors:LI Yanqun  HE Yunqi  QIAN Longhua  ZHOU Guodong
Affiliation:1.Natural Language Processing Laboratory, Soochow University, Suzhou, Jiangsu 215006, China;
2.School of Computer Science and Technology , Soochow University, Suzhou, Jiangsu 215006, China
Abstract:Nested named entities contain rich entities and semantic relations between them, which facilitates to improve the effectiveness of information extraction. Due to the lack of uniform and standard Chinese nested named entity corpora, currently it is difficult to compare the research works on Chinese nested named entities. Based on the existing named entity corpora, this paper proposes to use semi-automatic method to construct two Chinese nested named entity corpora. First, we use the annotation information in the Chinese named entity corpora to automatically construct as many nested named entities as possible, and then manually adjust them to meet our annotation requirements for Chinese nested entity in order to build high-quality Chinese nested named entity corpora. The preliminary experiment of nested named entity recognition both within and across the corpora shows that Chinese nested named entity recognition is still a quite difficult problem and requires further research.
Keywords:Chinese nested named entity recognition  conditional random fields  information extraction  corpus  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号