首页 | 本学科首页   官方微博 | 高级检索  
     

基于标签聚类的中文重叠命名实体识别方法
引用本文:温秀秀,马超,高原原,康子路.基于标签聚类的中文重叠命名实体识别方法[J].计算机工程,2020,46(5):41-46.
作者姓名:温秀秀  马超  高原原  康子路
作者单位:中国电子科技集团有限公司信息科学研究院,北京100081;中国电子科技集团有限公司信息科学研究院,北京100081;中国电子科技集团有限公司信息科学研究院,北京100081;中国电子科技集团有限公司信息科学研究院,北京100081
摘    要:为解决命名实体之间的复杂嵌套以及语料库中标注误差导致的相邻命名实体边界重叠问题,提出一种中文重叠命名实体识别方法。利用基于随机合并与拆分的层次化聚类算法将重叠命名实体标签划分到不同的聚类簇中,建立文字到实体标签之间的一对一关联关系,解决了实体标签聚类陷入局部最优的问题,并在每个标签聚类簇中采用融合中文部首的BiLSTM-CRF模型提高重叠命名实体的识别稳定性。实验结果表明,该方法通过标签聚类的方式有效避免标注误差对识别过程的干扰,F1值相比现有识别方法平均提高了0.05。

关 键 词:命名实体识别  实体重叠  中文命名实体  标签聚类  层次化聚类

Chinese Overlapping Named Entity Recognition Method Based on Label Clustering
WEN Xiuxiu,MA Chao,GAO Yuanyuan,KANG Zilu.Chinese Overlapping Named Entity Recognition Method Based on Label Clustering[J].Computer Engineering,2020,46(5):41-46.
Authors:WEN Xiuxiu  MA Chao  GAO Yuanyuan  KANG Zilu
Affiliation:(Information Science Academy,China Electronics Technology Group Corporation,Beijing 100081,China)
Abstract:To address complex nested relations between named entities and overlapping boundaries of adjacent named entities caused by mislabeling in corpus,this paper proposes a method of Chinese overlapping Named Entity Recognition(NER).First,a hierarchical clustering algorithm based on random merging and splitting is used to divide the labels of overlapping named entities into different clusters to build one-to-one relations between words and entity labels,which prevents the clustering of entity labels from falling into local optimization.Then,a Bidirectional Long Short Term Memory-Conditional Random Fields(BiLSTM-CRF)model integrating Chinese radicals is used in each label clustering to improve the stability of overlapping NER.Experimental results show that the proposed method can effectively avoid the impact of mislabeling on recognition through label clustering,improving the F1 value by 0.05 compared with the existing methods.
Keywords:Named Entity Recognition(NER)  entity overlapping  Chinese named entity  label clustering  hierarchical clustering
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号