首页 | 本学科首页   官方微博 | 高级检索  
     

中文电子病历命名实体和实体关系语料库构建
引用本文:杨锦锋,关毅,何彬,曲春燕,于秋滨,刘雅欣,赵永杰.中文电子病历命名实体和实体关系语料库构建[J].软件学报,2016,27(11):2725-2746.
作者姓名:杨锦锋  关毅  何彬  曲春燕  于秋滨  刘雅欣  赵永杰
作者单位:哈尔滨工业大学 语言技术研究中心 网络智能研究室, 黑龙江 哈尔滨 150001,哈尔滨工业大学 语言技术研究中心 网络智能研究室, 黑龙江 哈尔滨 150001,哈尔滨工业大学 语言技术研究中心 网络智能研究室, 黑龙江 哈尔滨 150001,哈尔滨工业大学 语言技术研究中心 网络智能研究室, 黑龙江 哈尔滨 150001,哈尔滨医科大学 附属第二医院 病案室, 黑龙江 哈尔滨 150086,哈尔滨医科大学 附属第二医院 呼吸内科, 黑龙江 哈尔滨 150086,哈尔滨医科大学 附属第四医院 神经内科, 黑龙江 哈尔滨 150001
摘    要:电子病历是由医务人员撰写的面向患者个体描述医疗活动的记录,蕴含了大量的医疗知识和患者的健康信息.电子病历命名实体识别和实体关系抽取等信息抽取研究对于临床决策支持、循证医学实践和个性化医疗服务等具有重要意义,而电子病历命名实体和实体关系标注语料库的构建是首当其冲的.在调研了国内外电子病历命名实体和实体关系标注语料库构建的基础上,结合中文电子病历的特点,提出适合中文电子病历的命名实体和实体关系的标注体系,在医生的指导和参与下,制定了命名实体和实体关系的详细标注规范,构建了标注体系完整、规模较大且一致性较高的标注语料库.语料库包含病历文本992份,命名实体标注一致性达到0.922,实体关系一致性达到0.895.为中文电子病历信息抽取后续研究打下了坚实的基础.

关 键 词:中文电子病历  命名实体  实体关系  标注规范  标注语料构建
收稿时间:2014/12/3 0:00:00
修稿时间:2015/6/24 0:00:00

Corpus Construction for Named Entities and Entity Relations on Chinese Electronic Medical Records
YANG Jin-Feng,GUAN Yi,HE Bin,QU Chun-Yan,YU Qiu-Bin,LIU Ya-Xin and ZHAO Yong-Jie.Corpus Construction for Named Entities and Entity Relations on Chinese Electronic Medical Records[J].Journal of Software,2016,27(11):2725-2746.
Authors:YANG Jin-Feng  GUAN Yi  HE Bin  QU Chun-Yan  YU Qiu-Bin  LIU Ya-Xin and ZHAO Yong-Jie
Affiliation:Web Intelligence Laboratory, Language Technology Research Center, Harbin Institute of Technology, Harbin 150001, China,Web Intelligence Laboratory, Language Technology Research Center, Harbin Institute of Technology, Harbin 150001, China,Web Intelligence Laboratory, Language Technology Research Center, Harbin Institute of Technology, Harbin 150001, China,Web Intelligence Laboratory, Language Technology Research Center, Harbin Institute of Technology, Harbin 150001, China,Medical Record Room, the 2nd Affiliated Hospital of Harbin Medical University, Harbin 150086, China,Respiratory Department, the 2nd Affiliated Hospital of Harbin Medical University, Harbin 150086, China and Neurology Department, the 4th Affiliated Hospital of Harbin Medical University, Harbin 150001, China
Abstract:An electronic medical record (EMR) is a patient''s individual medical record written by health care providers and stored in digital format in which much medical knowledge and information about patient''s personal health conditions are kept. The construction of annotated corpus for named entities and entity relations on EMR is a primary and fundamental task for information extraction which plays important role in clinical decision support, practice of evidence-based medicine, and other medical applications. Based on survey of current research on corpus construction for named entities and entity relations on EMR, this research proposes an annotation scheme for named entities and entity relations on Chinese electronic medical records (CEMR) according to characteristics of the records. Under the supervision of physicians, a complete and detailed annotation specification on CEMR is formulated, and an annotated corpus with high agreement is constructed. The corpus comprises 992 medical text documents, and inter-annotator agreement (IAA) of named entity annotations and entity relation annotations attain 0.922 and 0.895, respectively. The work presented in this paper builds substantial foundations for the subsequent research on information extraction in CEMR.
Keywords:Chinese electronic medical record  named entity  entity relation  annotation specification  annotated corpus construction
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号