首页 | 本学科首页   官方微博 | 高级检索  
     

基于多颗粒度文本表征的中文命名实体识别方法
引用本文:田雨,张桂平,蔡东风,陈华威,宋彦.基于多颗粒度文本表征的中文命名实体识别方法[J].中文信息学报,2022,36(4):90-99.
作者姓名:田雨  张桂平  蔡东风  陈华威  宋彦
作者单位:1.沈阳航空航天大学 人机智能研究中心,辽宁 沈阳 110136;
2.香港中文大学(深圳) 数据科学学院,广东 深圳 518172
基金项目:国家自然科学基金(U1908216);辽宁省重点研发计划(2019JH2/10100020)
摘    要:中文命名实体识别常使用字符嵌入作为神经网络模型的输入,但是中文没有明确的词语边界,字符嵌入的方法会导致部分语义信息的丢失。针对此问题,该文提出了一种基于多颗粒度文本表征的中文命名实体识别模型。首先,在模型输入端结合了字词表征,然后借助N-gram编码器挖掘N-gram中潜在的成词信息,有效地联合了三种不同颗粒度的文本表征,丰富了序列的上下文表示。该文在Weibo、Resume和OntoNotes4数据集上进行了实验,实验结果的F1值分别达到了72.41%、96.52%、82.83%。与基准模型相比,该文提出的模型具有更好的性能。

关 键 词:中文命名实体识别  多颗粒度文本表征  N-gram  

Chinese Named Entity Recognition Based on Text Representation Multi-granularty
TIAN Yu,ZHANG Guiping,CAI Dongfeng,CHEN Huawei,SONG Yan.Chinese Named Entity Recognition Based on Text Representation Multi-granularty[J].Journal of Chinese Information Processing,2022,36(4):90-99.
Authors:TIAN Yu  ZHANG Guiping  CAI Dongfeng  CHEN Huawei  SONG Yan
Affiliation:1.Human-Computer Intelligence Research Center, Shenyang Aerospace University, Shenyang, Liaoning 110136, China;
2.School of Data Science, The Chinese University of HongKong (Shenzhen), Shenzhen, Guangdong 518172, China
Abstract:Chinese named entity recognition utilizes character embedding as the input of neural network models, which may give rise to the loss of certain semantic information since there is no clear word boundary in Chinese. To figure out the aforementioned issue, this paper proposes an entity recognition method based on multi-granular text representations. Firstly, the char and word representation are combined as the model input. Then the N-gram encoder is exploited to explore the potential word information in the N-gram which enriches the contextual representation of the sequence. The experimental results on the Weibo, Resume and OntoNotes4 dataset outperform the baseline and reach 72.41%, 96.52% and 82.83% respectively.
Keywords:Chinese named entity recognition  multi-granular text representation  N-gram  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号