首页 | 本学科首页   官方微博 | 高级检索  
     

基于RoBERTa-WWM的中文电子病历命名实体识别
引用本文:朱岩,张利,王煜.基于RoBERTa-WWM的中文电子病历命名实体识别[J].计算机与现代化,2021,0(2):51-55.
作者姓名:朱岩  张利  王煜
作者单位:蚌埠医学院护理学院,安徽蚌埠233030;中国科学技术大学科学岛分院,安徽合肥230001
基金项目:安徽省科技厅重大专项;蚌埠医学院研究生科技创新计划项目
摘    要:电子病历(EMRs)中包含着丰富的信息,如临床症状、诊断结果和药物疗效。命名实体识别(Named Entity Recognition, NER)旨在从非结构化文本中抽取命名实体,这也是从电子病历中抽取有价值信息的初始步骤。本文提出一种基于预训练模型RoBERTa-WWM(A Robustly Optimized BERT Pre-training Approach-Whole Word Masking)的命名实体识别方法。该方法引入预训练模型RoBERTa-WWM,利用其生成含有先验知识的语义表示。与BERT(Bidirectional Encoder Representations from Transformers)相比,RoBERTa-WWM生成的语义表示更适用于中文的命名实体识别任务,因为其在预训练阶段会进行全词掩码。RoBERTa-WWM生成的语义表示被依次输入双向长短时记忆(Bidirectional Long Short-Term Memory, BiLSTM)和条件随机场(Conditional Random Field, CRF)模型。实验结果表明,该方法在“2019全国知识图谱与语义计算大会(CCKS 2019)”数据集上可以有效提升F1值,提高中文电子病历中命名实体的识别效果。

关 键 词:电子病历  命名实体识别    RoBERTa-WWM    信息抽取  
收稿时间:2021-03-01

Named Entity Recognition on Chinese Electronic Medical Records Based on RoBERTa-WWM
ZHU Yan,ZHANG Li,WANG Yu.Named Entity Recognition on Chinese Electronic Medical Records Based on RoBERTa-WWM[J].Computer and Modernization,2021,0(2):51-55.
Authors:ZHU Yan  ZHANG Li  WANG Yu
Abstract:Electronic Medical Records (EMRs) contain abundant information, such as clinical symptoms, diagnosis results and drug efficacy. Named Entity Recognition (NER) aims to extract named entities from unstructured texts. It is also the initial step to extract valuable information from the EMRs. This paper proposes a method to recognize named entities based on the RoBERTa-WWM (A Robustly Optimized BERT Pre-training Approach-Whole Word  Masking). RoBERTa-WWM is a kind of pre-training model, which is utilized to generate semantic representations with prior knowledge. Compared with BERT (Bidirectional Encoder Representations from Transformers), the semantic representations generated by RoBERTa-WWM are more suitable for Chinese NER task because it masks the whole word during pre-training. The semantic representations are then inputted into Bidirectional Long  Short-Term Memory (BiLSTM) and Conditional Random Field (CRF) models in turn. The experimental results show that this method can effectively improve the F1-score on “China Conference on Knowledge Graph and Semantic Computing 2019 (CCKS 2019)” dataset and improve the performance of NER in Chinese EMRs.
Keywords:electronic medical records  named entity recognition  RoBERTa-WWM  information extraction  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号