首页 | 本学科首页   官方微博 | 高级检索  
     

面向中文电子病历的多粒度医疗实体识别
引用本文:周晓进,徐陈铭,阮彤.面向中文电子病历的多粒度医疗实体识别[J].计算机科学,2021,48(4):237-242.
作者姓名:周晓进  徐陈铭  阮彤
作者单位:华东理工大学信息科学与工程学院 上海 200237;华东理工大学理学院 上海 200237
基金项目:国家自然科学基金项目;"精准医学研究"重大专项项目
摘    要:在现有的面向中文临床电子病历的命名实体识别任务中,实体标注粒度通常过细或过粗,过细的标注结果难以找到实际应用场景,而过粗的标注结果通常需要在进行复杂的处理后,才能明确实体的规范形式和语义类型,以便于后续的数据挖掘应用。为简化处理步骤,根据常见的7类粗粒度临床实体的特点,定义了用以解释粗粒度实体的9类细粒度解析实体。同时,针对多粒度实体的特点,提出了基于多任务学习和自注意力机制的多粒度临床实体识别模型,并在真实的医院电子病历库中标注了5000条包含多粒度实体的文本以验证模型的效果。实验结果表明,该模型优于主流的序列标注模型,在粗、细粒度实体识别任务中,两者的F 1值分别达到了92.88和85.48。

关 键 词:电子病历  多粒度实体识别  多任务学习

Multi-granularity Medical Entity Recognition for Chinese Electronic Medical Records
ZHOU Xiao-jin,XU Chen-ming,RUAN Tong.Multi-granularity Medical Entity Recognition for Chinese Electronic Medical Records[J].Computer Science,2021,48(4):237-242.
Authors:ZHOU Xiao-jin  XU Chen-ming  RUAN Tong
Affiliation:(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China;School of Science,East China University of Science and Technology,Shanghai 200237,China)
Abstract:In the existing named entity recognition task for Chinese clinical electronic medical records,the granularity of annotation is usually too fine or too coarse,and it is difficult to find actual application scenarios for the too thin annotation results while the too thick annotation results usually need complex post-processing steps to clarify the standard form and the semantic type of entities,so as to facilitate subsequent data mining applications.In order to simplify post-processing steps,9 kinds of fine-grained analytical entities are defined to explain coarse-grained entities according to characteristics of 7 common coarse-grained clinical entities.Besides,according to characteristics of multi-granularity entities,a multi granularity clinical entity recognition model based on multi-task learning and self-attention mechanism is proposed,and 5000 texts containing multi-granular entities are annotated on real hospital electronic medical records to verify the model.Experiment results show that this model outperforms the mainstream sequence labeling model.In the task of coarse and fine granularity entity recognition,their F 1 scores reach 92.88 and 85.48,respectively.
Keywords:Electronic medical records  Multi-granularity named entity recognition  Multi-task learning
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号