基于强化学习的对抗预训练语言建模方法 A Generative Adversarial Network for Pre-trained Language Models Based on Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于强化学习的对抗预训练语言建模方法

引用本文：	颜俊琦,孙水发,吴义熔,裴伟,董方敏.基于强化学习的对抗预训练语言建模方法[J].中文信息学报,2022,36(4):20-28.

作者姓名：	颜俊琦孙水发吴义熔裴伟董方敏

作者单位：	1.三峡大学智慧医疗宜昌市重点实验室,湖北宜昌 443002; 2.三峡大学计算机与信息学院,湖北宜昌 443002

基金项目：	国家自然科学基金(U1703261);国家社会科学基金(20BTQ066)

摘要：	在大规模无监督语料上的BERT、XLNet等预训练语言模型,通常采用基于交叉熵损失函数的语言建模任务进行训练。模型的评价标准则采用困惑度或者模型在其他下游自然语言处理任务中的性能指标,存在损失函数和评测指标不匹配等问题。为解决这些问题,该文提出一种结合强化学习的对抗预训练语言模型RL-XLNet(Reinforcement Learning-XLNet)。RL-XLNet采用对抗训练方式训练一个生成器,基于上下文预测选定词,并训练一个判别器判断生成器预测的词是否正确。通过对抗网络生成器和判别器的相互促进作用,强化生成器对语义的理解,提高模型的学习能力。由于在文本生成过程中存在采样过程,导致最终的损失无法直接进行回传,故提出采用强化学习的方式对生成器进行训练。基于通用语言理解评估基准(GLUE Benchmark)和斯坦福问答任务(SQuAD 1.1)的实验,结果表明,与现有BERT、XLNet方法相比,RL-XLNet模型在多项任务中的性能上表现出较明显的优势: 在GLUE的六个任务中排名第1,一个任务排名第2,一个任务排名第3。在SQuAD 1.1任务中F₁值排名第1。考虑到运算资源有限,基于小语料集的模型性能也达到了领域先进水平。
关键词：	自然语言处理预训练语言模型强化学习
A Generative Adversarial Network for Pre-trained Language Models Based on Reinforcement Learning

YAN Junqi,SUN Shuifa,WU Yirong,PEI Wei,DONG Fangmin.A Generative Adversarial Network for Pre-trained Language Models Based on Reinforcement Learning[J].Journal of Chinese Information Processing,2022,36(4):20-28.

Authors:	YAN Junqi SUN Shuifa WU Yirong PEI Wei DONG Fangmin

Affiliation:	1.Yichang Key Laboratory of Intelligent Medicine, China Three Gorges University, Yichang, Hubei 443002, China; 2.School of Computer and Information Technology, China Three Gorges University, Yichang, Hubei 443002, China

Abstract:	For pre-trained language models built on large-scale unsupervised corpus, such as BERT and XLNet, cross-entropy loss is routinely utilized as the loss function and models are typically evaluated by perplexity or other task losses. To deal with such mismatch between the training and evaluation loss functions, an improved pre-trained language model named RL-XLNet using Generative Adversarial Networks (GAN) and Reinforcement Learning (RL) is proposed. A generative model is trained to predict selected words, and a discriminative model is trained to predict whether the predicted token is correct or not. The reinforcement learning is adopted to train the generator. Through the interaction of the generator and the discriminator, the learning of semantic information is enhanced. Experiments on GLUE Benchmark and SQuAD question-answering Benchmark show that RL-XLNet outperforms traditional BERT and XLNet models in multiple natural language processing tasks: top-ranked in six tasks in GLUE, and top-ranked according to F1 scores in SQuAD task.

Keywords:	natural language processing pre-training language model reinforcement learning

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏