基于数据增强的中文医疗命名实体识别 Data Augmentation for Chinese Clinical Named Entity Recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于数据增强的中文医疗命名实体识别

引用本文：	王蓬辉,李明正,李思. 基于数据增强的中文医疗命名实体识别[J]. 北京邮电大学学报, 2020, 43(5): 84-90. DOI: 10.13190/j.jbupt.2020-032

作者姓名：	王蓬辉李明正李思

作者单位：	北京邮电大学人工智能学院, 北京 100876

基金项目：	国家自然科学基金项目（61702047）

摘要：	由于缺乏大量已标注数据，在中文医疗命名实体识别中，主要利用外部资源来改善医疗实体识别的性能，这需要大量的时间和有效的规则加入外部资源.为了解决标注数据不足的问题，提出了一种基于生成对抗网络的数据增强算法，自动生成大量标注数据，提高医疗实体识别的性能.实验结果表明，该算法在性能方面优于实验中的基准模型，证明了该算法在医疗实体识别上的有效性.
关键词：	命名实体识别数据增强序列生成对抗网络
收稿时间：	2020-03-24
Data Augmentation for Chinese Clinical Named Entity Recognition

WANG Peng-hui,LI Ming-zheng,LI Si. Data Augmentation for Chinese Clinical Named Entity Recognition[J]. Journal of Beijing University of Posts and Telecommunications, 2020, 43(5): 84-90. DOI: 10.13190/j.jbupt.2020-032

Authors:	WANG Peng-hui LI Ming-zheng LI Si

Affiliation:	School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract:	Chinese clinical named entity recognition plays an important role in recognizing medical entities contained in Chinese electronic medical records. Limited to lack of large annotated data, most of existing methods concentrate on employing external resources to improve the performance of clinical named entity recognition, which require lots of time and efficient rules. To solve the problem of lack of large annotated data, data augmentation using sequence adversarial generative network is used to generate more various data depending on entities and non-entities in the training set. Experiments show that when using generated data to expand training set, the proposed named entity recognition system has achieved competitive performance compared with state-of-art methods, which shows the effectiveness of our data augmentation method.

Keywords:	named entity recognition data augmentation generative adversarial network

	点击此处可从《北京邮电大学学报》浏览原始摘要信息
	点击此处可从《北京邮电大学学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏