首页 | 本学科首页   官方微博 | 高级检索  
     

面向垂直领域的阅读理解数据增强方法
引用本文:吕政伟,杨雷,石智中,梁霄,雷涛,刘多星. 面向垂直领域的阅读理解数据增强方法[J]. 中文信息学报, 2021, 35(11): 127-134
作者姓名:吕政伟  杨雷  石智中  梁霄  雷涛  刘多星
作者单位:汽车之家, 北京 100080
摘    要:阅读理解问答系统是利用语义理解等自然语言处理技术,根据输入问题,对非结构化文档数据进行分析,生成一个答案,具有很高的研究和应用价值。在垂直领域应用过程中,阅读理解问答数据标注成本高且用户问题表达复杂多样,使得阅读理解问答系统准确率低、鲁棒性差。针对这一问题,该文提出一种面向垂直领域的阅读理解问答数据的增强方法,基于真实用户问题,构造阅读理解训练数据,一方面降低标注成本,另一方面增加训练数据多样性,提升模型的准确率和鲁棒性。该文用汽车领域数据对本方法进行实验验证,其结果表明,该方法对垂直领域中阅读理解模型的准确率和鲁棒性均得到有效提升。

关 键 词:阅读理解  数据增强  问答系统  
收稿时间:2021-02-21

Data Augmentation for Domain Specific Reading Comprehension
LV Zhengwei,YANG Lei,SHI Zhizhong,LIANG Xiao,LEI Tao,LIU Duoxing. Data Augmentation for Domain Specific Reading Comprehension[J]. Journal of Chinese Information Processing, 2021, 35(11): 127-134
Authors:LV Zhengwei  YANG Lei  SHI Zhizhong  LIANG Xiao  LEI Tao  LIU Duoxing
Affiliation:Autohome Inc., Beijing 100080, China
Abstract:Reading comprehension as an advanced form of question answering system develops semantic understanding to analyze unstructured documents and generate answers, which has important research value and vast application prospects. Due to the high cost of obtaining training samples, reading comprehension for specific domain suffers from poor accuracy and robustness. In this paper we propose a data augmentation method for domain specific reading comprehension modeling, which constructs training samples based on real user questions. The experiments in the automobile field show that the method can effectively improve the accuracy and robustness of reading comprehension model.
Keywords:reading comprehension    data augmentation    question answering system  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号