首页 | 本学科首页   官方微博 | 高级检索  
     

融合多特征嵌入与注意力机制的中文电子病历命名实体识别
作者姓名:巩敦卫  张永凯  郭一楠  王斌  樊宽鲁  火焱
作者单位:1.中国矿业大学信息与控制工程学院,徐州 221116
基金项目:国家自然科学基金资助项目(61973305,61773384);中国矿业大学中央高校基本科研业务费专项资金资助项目(2020ZDPY0302)
摘    要:中文电子病历文本包含大量嵌套实体、句子语法结构复杂、句式偏短。为有效识别其医疗实体,提出一种融合多特征嵌入与注意力机制的命名实体识别算法,在输入表示层融合字符、单词、字形三个粒度的特征,并在双向长短期记忆网络的隐含层引入注意力机制,使算法在捕获特征时更加关注于医疗实体相关的字符,最终实现对中文电子病历中疾病、身体部位、症状、药物、操作五类实体的最优标注。面向开源和自建糖尿病数据集的实验结果中所提算法的实体识别准确率、召回率和F1值都达到97%以上,表明其可以更加有效地识别中文电子病历中各类实体。 

关 键 词:中文    电子病历    命名实体识别    多特征嵌入    注意力机制
收稿时间:2021-01-12

Named entity recognition of Chinese electronic medical records based on multifeature embedding and attention mechanism
Authors:GONG Dun-wei  ZHANG Yong-kai  GUO Yi-nan  WANG Bin  FAN Kuan-lu  HUO Yan
Affiliation:1.School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China2.Intelligent Medical Center, Institute of Artificial Intelligence, China University of Mining and Technology, Xuzhou 221116, China3.Department of Endocrinology, the Second Affiliated Hospital of Xuzhou Medical University, Xuzhou 221000, China4.Department of Endocrinology, Affiliated Hospital of China University of Mining and Technology, Xuzhou 221116, China
Abstract:Medical records, as an essential part of the health care records of residents, save all the information about the clinical treatment of patients, which are traditionally written by doctors on paper. With the development of information technologies, electronic medical records that are more easily saved and managed gradually replace the traditional ones. Intelligent auxiliary diagnosis, patients’ portrait construction, and disease prediction based on medical reports have become research hotspots in the field of intelligent medical care. To fully discover the hidden relationship between symptoms and diseases from the documents saved in electronic medical records, the development of an efficient named entity recognition algorithm is the key issue. Although several studies have been conducted on it, there is relatively little research on the information extraction of Chinese electronic medical records. To the best of our knowledge, the documents in Chinese electronic medical records contain a large number of nested named entities and short sentences. Moreover, there is weak logic among the sentences, causing a complex syntax structure. To effectively recognize the medical entities, a novel named entity recognition method based on multifeature embedding and attention mechanism was proposed. After embedding three types of features derived from characters, words, and glyphs in the input presentation layer, an attention machine was introduced to the hidden layer of the bidirectional long short-term memory network to make the model focus on the characters related to the medical entities. Finally, the optimal labels for the five types of entities in Chinese electronic medical records, including diseases, body parts, symptoms, drugs, and operations, were obtained. The experimental results for the open and self-built Chinese electronic medical records, recognition accuracy, recall rate, and F1 value of the proposed algorithm are all better than 97%, which shows that the proposed algorithm can effectively identify various entities in Chinese electronic medical records. 
Keywords:
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《》浏览原始摘要信息
点击此处可从《》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号