首页 | 本学科首页   官方微博 | 高级检索  
     

深度生成式模型在临床术语标准化中的应用
引用本文:闫璟辉,向露,周玉,孙建,陈思,薛晨.深度生成式模型在临床术语标准化中的应用[J].中文信息学报,2021,35(5):77-85.
作者姓名:闫璟辉  向露  周玉  孙建  陈思  薛晨
作者单位:1.北京交通大学 计算机与信息技术学院 交通数据分析与挖掘北京市重点实验室,北京 100044;
2.中国科学院 自动化研究所 模式识别国家重点实验室,北京 100190;
3.北京中科凡语科技有限公司,北京 100080;
4.凡语AI研究院,北京100080;
5.中国科学院大学 人工智能学院,北京 100049
摘    要:临床术语标准化任务是医学统计中不可或缺的一部分。在实际应用中,一个标准的临床术语可能有数种口语化和非标准化的描述,而对于一些应用例如临床知识库的构建而言,如何将这些描述进行标准化是必须要面对的问题。该文主要关注中文临床术语的标准化任务,即将非标准的中文临床术语的描述文本和给定的临床术语库中的标准词进行对应。尽管一些深度判别式模型在简单文本结构的医疗术语,例如,疾病、药品名等的标准化任务上取得了一定成效,但对于中文临床术语标准化任务而言,其带标准化的描述文本中经常包含的信息缺失、“一对多”等情况,仅依靠判别式模型无法得到完整的语义信息,因而导致模型效果欠佳。该文将临床术语标准化任务类比为翻译任务,引入深度生成式模型对描述文本的核心语义进行生成并得到标准词候选集,再利用基于BERT的语义相似度算法对候选集进行重排序得到最终标准词。该方法在第五届中国健康信息处理会议(CHIP2019)评测数据中进行了实验并取得了很好的效果。

关 键 词:术语标准化  核心语义  生成式模型  
收稿时间:2020-06-15

Clinical Entity Normalization Using Deep Generative Model
YAN Jinghui,XIANG Lu,ZHOU Yu,SUN Jian,CHEN Si,XUE Chen.Clinical Entity Normalization Using Deep Generative Model[J].Journal of Chinese Information Processing,2021,35(5):77-85.
Authors:YAN Jinghui  XIANG Lu  ZHOU Yu  SUN Jian  CHEN Si  XUE Chen
Affiliation:1.School of Computer Science and Information Technology & Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China;2.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;3.Beijing Fanyu Technology Co. Ltd, Beijing 100080, China;4.Fanyu AI Research, Beijing 100080, China;5.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:Clinical entity normalization is an indispensable part of medical statistics. In practice, a standard clinical term entity has several kinds of colloquialisms and non-standardized mentions, and for some applications such as the a clinical knowledge base construction, how to normalize these mentions is an issue that has to address. This paper is focused on the Chinese clinical entity normalization, i.e., linking non-standard Chinese clinical entity to the standard words which are in the given clinical terminology base. Specifically, we treat the clinical entity normalization task as a translation task, and employ a deep learning model to generate the core semantics of the clinical mentions and obtain the candidate set of the standard entity. The final standard words were obtained by re-ranking the candidate set by using a BERT-based semantic similarity model. Experiments on the data of the 5th China Conference on Health Information Processing (CHIP2019) achieve good results.
Keywords:entity normalization  core semantics  generative model  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号