首页 | 本学科首页   官方微博 | 高级检索  
     

基于条件随机场的中医临床病历命名实体抽取
引用本文:刘凯,;周雪忠,;于剑,;张润顺.基于条件随机场的中医临床病历命名实体抽取[J].计算机工程,2014(9):312-316.
作者姓名:刘凯  ;周雪忠  ;于剑  ;张润顺
作者单位:[1]北京交通大学计算机与信息技术学院,北京100044; [2]北京交通大学交通数据分析与挖掘北京市重点实验室,北京100044; [3]中国中医科学院广安门医院,北京100053
基金项目:国家自然科学基金资助项目(61105055,81230086); 国家“863”计划基金资助项目(2012AA02A609); 中央高校基本科研业务费专项基金资助项目(K13JB00140)
摘    要:中医临床病历是中医重要的科研数据资源,但目前临床病历仍以文本为主要表达形式,对病历数据深入分析的前提是进行结构化信息抽取,而命名实体抽取是其基础性步骤。针对中医临床病历的命名实体,如症状、疾病和诱因等的抽取问题,通过手工标注的413份病历数据(以中文字为特征)与4类特征模版,将条件随机场(CRF)、隐马尔科夫模型(HMM)和最大熵马尔科夫模型(MEMM)用于中医病历命名实体抽取的实验,并进行比较分析。结果表明,结合合适的特征模版,CRF命名实体抽取方法取得了较好的性能,F1值的症状达到0.80,疾病名称达到0.74,诱因0.74。与HMM和MEMM相比,CRF有最高的准确率和召回率,是一种较为适用的中医临床病历命名实体抽取方法。

关 键 词:中医临床病历  命名实体抽取  语料库标注系统  条件随机场  特征模板

Named Entity Extraction of Traditional Chinese Medicine Medical Records Based on Conditional Random Field
Affiliation:LIU Kai,ZHOU Xue-zhong,YU Jian,ZHANG Run-shun(1 a. School of Computer and Information Technology;lb. Beijing Key Lab of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044, China;2. Guang' anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China)
Abstract:Traditional Chinese Medicine(TCM)medical records are the important data resources of the TCM medical research. The main form of them is still text now,and it is necessary to extract the structured information from the medical records,while named entity extraction is the basic step. It makes413 copies of manually labeled medical records in Chinese text and four types of feature templates to study about the named entity extraction practice such as symptoms,diseases and incentives. It compares the results of TCM medical records named entity extraction by Conditional Random Field(CRF),Hidden Markov Model(HMM)and Maximum Entropy Markov Model(MEMM). Combined with appropriate feature templates,CRF has well performance of F1:symptoms0.80,the name of the disease0.74,incentives0.74. Compared with HMM and MEMM,CRF has the highest precision and recall rate. This preliminary shows that CRF is an applicable method of the Chinese medical records named entity extraction.
Keywords:Traditional Chinese Medicine(TCM)medical records  named entity extraction  corpus annotation system  Conditional Random Field(CRF)  feature template
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号