首页 | 本学科首页   官方微博 | 高级检索  
     

基于数据增强的中文医疗命名实体识别
引用本文:王蓬辉,李明正,李思. 基于数据增强的中文医疗命名实体识别[J]. 北京邮电大学学报, 2020, 43(5): 84-90. DOI: 10.13190/j.jbupt.2020-032
作者姓名:王蓬辉  李明正  李思
作者单位:北京邮电大学 人工智能学院, 北京 100876
基金项目:国家自然科学基金项目(61702047)
摘    要:由于缺乏大量已标注数据,在中文医疗命名实体识别中,主要利用外部资源来改善医疗实体识别的性能,这需要大量的时间和有效的规则加入外部资源.为了解决标注数据不足的问题,提出了一种基于生成对抗网络的数据增强算法,自动生成大量标注数据,提高医疗实体识别的性能.实验结果表明,该算法在性能方面优于实验中的基准模型,证明了该算法在医疗实体识别上的有效性.

关 键 词:命名实体识别  数据增强  序列生成对抗网络  
收稿时间:2020-03-24

Data Augmentation for Chinese Clinical Named Entity Recognition
WANG Peng-hui,LI Ming-zheng,LI Si. Data Augmentation for Chinese Clinical Named Entity Recognition[J]. Journal of Beijing University of Posts and Telecommunications, 2020, 43(5): 84-90. DOI: 10.13190/j.jbupt.2020-032
Authors:WANG Peng-hui  LI Ming-zheng  LI Si
Affiliation:School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
Abstract:Chinese clinical named entity recognition plays an important role in recognizing medical entities contained in Chinese electronic medical records. Limited to lack of large annotated data, most of existing methods concentrate on employing external resources to improve the performance of clinical named entity recognition, which require lots of time and efficient rules. To solve the problem of lack of large annotated data, data augmentation using sequence adversarial generative network is used to generate more various data depending on entities and non-entities in the training set. Experiments show that when using generated data to expand training set, the proposed named entity recognition system has achieved competitive performance compared with state-of-art methods, which shows the effectiveness of our data augmentation method.
Keywords:named entity recognition  data augmentation  generative adversarial network  
点击此处可从《北京邮电大学学报》浏览原始摘要信息
点击此处可从《北京邮电大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号