首页 | 本学科首页   官方微博 | 高级检索  
     

融合词典特征的Bi-LSTM-WCRF中文人名识别
引用本文:成于思,施云涛.融合词典特征的Bi-LSTM-WCRF中文人名识别[J].中文信息学报,2020,34(4):69-76.
作者姓名:成于思  施云涛
作者单位:1.东南大学 土木工程学院,江苏 南京 210096;
2.苏宁科技集团云计算研发中心,江苏 南京 210042
基金项目:国家自然科学基金(71601047);中国博士后科学基金(2015M581706)
摘    要:受限于标注语料的领域和规模以及类别不均衡,中文人名识别性能偏低。相比人名识别训练语料,人名词典获取较为容易,利用词典提升人名识别性能有待进一步研究。该文提取人名词典特征,融入到双向长短期记忆(Bi-LSTM)网络模型中,在损失函数中提高人名标签权重,设计加权条件随机场(WCRF)。从人名词典中获取姓和名相关的特征信息,Bi-LSTM网络捕获句子中上下文信息,WCRF提高人名识别的召回率。在《人民日报》语料和工程法律领域语料上进行实验,结果表明: 在领域测试语料上,与基于隐马尔可夫模型的方法相比,人名识别的F1值提高18.34%,与传统Bi-LSTM-CRF模型相比,召回率提高15.53%,F1提高8.83%。WCRF还可以应用到其他类别不均衡的序列标注或分类问题中。

关 键 词:人名识别  双向长短期记忆网络  加权条件随机场  词典特征  

Bi-LSTM-WCRF Incorporating Dictionary Feature for Chinese Person Name Recognition
CHENG Yusi,SHI Yuntao.Bi-LSTM-WCRF Incorporating Dictionary Feature for Chinese Person Name Recognition[J].Journal of Chinese Information Processing,2020,34(4):69-76.
Authors:CHENG Yusi  SHI Yuntao
Affiliation:1.School of Civil Engineering, Southeast University, Nanjing, Jiangsu 210096, China;
2.Cloud Computing Research Center, Suning Technology Corporation, Nanjing, Jiangsu 210042, China
Abstract:Chinese person name recognition is restricted by the domain and size of the existing annotated corpus and the issue of class imbalance. Person name dictionaries and domain dictionaries are more easily achieved than humanly annotated training corpus. This article incorporates dictionaries into bi-directional long short-term memory (Bi-LSTM) networks with weighted conditional random field layer (WCRF). The model extracts the possibility of family name and given name from personal name dictionaries. The domain dictionaries provide information on human names. Bi-LSTM captured context information and weighted conditional random field improved recall of personal name recognition. Experiments on People's Daily corpus and construction law corpus show that, compared with the existing method based on hidden Markov model, the F1 value of personal name recognition is improved by 18.34%; compared with traditional Bi-LSTM-CRF model, Recall value increases by 15.53% and F1 value increases by 8.83%.
Keywords:person name recognition  bi-directional long short-term memory netwoork  weighted conditional random field  dictionary features  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号