首页 | 本学科首页   官方微博 | 高级检索  
     

融合语义及边界信息的中文电子病历命名实体识别
引用本文:崔少国,陈俊桦,李晓虹.融合语义及边界信息的中文电子病历命名实体识别[J].电子科技大学学报(自然科学版),2022,51(4):565-571.
作者姓名:崔少国  陈俊桦  李晓虹
作者单位:重庆师范大学计算机与信息科学学院 重庆 沙坪坝区 401331
基金项目:重庆市教委科技项目(KJQN201800539,KJQN202000510);;教育部人文社科项目(18XJC880002);
摘    要:中文电子病历数据专业性强,语法结构复杂,用于自然语言处理(NLP)的命名实体识别(NER)难度大。为了从电子病历数据中精确识别出医疗实体,提出了一种融合语义及边界信息的命名实体识别算法。首先,利用卷积神经网络(CNN)结构提取汉字图形信息,并与五笔特征拼接来丰富汉字的语义信息;然后,利用FLAT模型中的Lattice将医学词典作为字符潜在词组匹配文本信息;最后,将融入语义信息的Lattice模型用于中文电子病历命名实体识别。实验结果表明,该方法在Yidu-S4K数据集上的识别性能超过现有多种算法,且在Resume数据集上F1值可达到96.06%。

关 键 词:中文电子病历    FLAT    医学字典    命名实体识别    自然语言处理
收稿时间:2021-11-30

Named Entity Recognition for Chinese Electronic Medical Record by Fusing Semantic and Boundary Information
Affiliation:College of Computer and Information Science, Chongqing Normal University Shapingba Chongqing 401331
Abstract:Chinese electronic medical record texts are highly professional, with complex grammar,it is difficult to use named entity recognition (NER) for natural language processing (NLP). In order to accurately identify medical entities from electronic medical record data, a named entity recognition algorithm combining semantic and boundary information is proposed. In this algorithm, the graphic information of Chinese characters is extracted by using the convolutional neural network (CNN) structure and the semantic information of the Chinese characters is enriched with Wubi features. And then the text information is matched with medical dictionary as a potential phrase of characters by using the Lattice in the FLAT model. Finally, the Lattice model incorporating semantic information is used for named entity recognition in Chinese electronic medical records. The experimental results show that this method has better recognition performance than other existing methods on the Yidu-S4K data set, and the F1 value on the Resume dataset is 96.06%.
Keywords:
点击此处可从《电子科技大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《电子科技大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号