首页 | 本学科首页   官方微博 | 高级检索  
     

基于中文电子病历的跨科室组块分析
引用本文:戴雪,蒋志鹏,关毅.基于中文电子病历的跨科室组块分析[J].计算机应用研究,2017,34(7).
作者姓名:戴雪  蒋志鹏  关毅
作者单位:哈尔滨工业大学 计算机科学与技术学院,哈尔滨工业大学 计算机科学与技术学院,哈尔滨工业大学 计算机科学与技术学院
摘    要:针对医疗领域的研究,发现了不同科室间电子病历存在着差异,但是新语料的标注成本又非常高。为了解决这一问题,利用迁移学习的方法在中文电子病历中进行跨科室组块分析的研究。在构建的中文电子病历中,对比了SSVM与CRF模型在词性标注和组块分析上的实验结果,发现SSVM模型的效果更好并选择该模型作为基本标注模型。此外,使用了改进的结构对应学习算法(SCL)进行组块分析,使得该算法能适用于SSVM模型进行领域适应。实验结果表明该算法有效地改善了序列标注任务中跨科室的领域适应性问题。

关 键 词:中文电子病历  词性标注  组块分析  领域适应  结构化支持向量机
收稿时间:2016/5/12 0:00:00
修稿时间:2017/5/16 0:00:00

Cross-department Chunking Based on Chinese Electronic Medical Record
DAI Xue,JIANG Zhi-peng and GUAN Yi.Cross-department Chunking Based on Chinese Electronic Medical Record[J].Application Research of Computers,2017,34(7).
Authors:DAI Xue  JIANG Zhi-peng and GUAN Yi
Affiliation:School of Computer Science and Technology,Harbin Institute of Technology,School of Computer Science and Technology,Harbin Institute of Technology,
Abstract:Aiming at the study of medical field, found that there are differences between Chinese electronic medical records (CEMRs) from different departments, but the cost of new annotated corpus is very expensive. To solve this problem, this paper applied a method of transfer learning in study of cross-department chunking based on Chinese electronic records. Comparing the performance of SSVM and CRF algorithms on part-of-speech (POS) tagging and chunking tasks in established CEMRs, found that SSVM was better, then chose this to train the basic model. Moreover, proposed a modified structural correspondence learning (SCL) algorithm adapted to SSVM algorithm for domain adaption on POS tagging and chunking tasks. The results of experiments show that this modified algorithm effectively improves domain adaptability between the different departments on sequence labeling tasks.
Keywords:Chinese electronic medical record  part-of-speech tagging  chunking  domain adaptation  Structured SVM
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号