探索低资源的迭代式复述生成增强方法 Explore the low-resource iterative paraphrase generation enhancement method期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

探索低资源的迭代式复述生成增强方法

引用本文：	张琳,刘明童,张玉洁,徐金安,陈钰枫. 探索低资源的迭代式复述生成增强方法[J]. 智能系统学报, 2022, 17(4): 680-687. DOI: 10.11992/tis.202106032

作者姓名：	张琳刘明童张玉洁徐金安陈钰枫

作者单位：	北京交通大学计算机与信息技术学院，北京 100044

摘要：	复述生成旨在同一语言内将给定句子转换成语义一致表达不同的句子。目前，基于深度神经网络的复述生成模型的成功依赖于大规模的复述平行语料，当面向新的语言或新的领域时，模型性能急剧下降。面对这一困境，提出低资源的迭代式复述生成增强方法，最大化利用单语语料和小规模复述平行语料迭代式训练复述生成模型并生成复述伪数据，以此增强模型性能。此外，提出了句子流畅性、语义相近性和表达多样性为基准设计的伪数据筛选算法，选取高质量的复述伪数据参与每轮模型的迭代训练。在公开数据集Quora上的实验结果表明，提出的方法仅利用30%的复述语料在语义和多样性指标上均超过了基线模型，验证了所提方法的有效性。
关键词：	低资源迭代式复述生成数据增强筛选算法神经网络模型编码–解码框架注意力机制
Explore the low-resource iterative paraphrase generation enhancement method

ZHANG Lin,LIU Mingtong,ZHANG Yujie,XU Jin’an,CHEN Yufeng. Explore the low-resource iterative paraphrase generation enhancement method[J]. CAAL Transactions on Intelligent Systems, 2022, 17(4): 680-687. DOI: 10.11992/tis.202106032

Authors:	ZHANG Lin LIU Mingtong ZHANG Yujie XU Jin’an CHEN Yufeng

Affiliation:	School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

Abstract:	Paraphrase generation aims to convert a given sentence into semantically consistent different sentences within the same language. At present, the success of deep neural network-based paraphrase generation models depends on large-scale paraphrase parallel corpora. When faced with new languages or new domains, the model’s performance drops sharply. We propose a low-resource iterative paraphrase generation enhancement method faced with this dilemma, which maximizes the use of monolingual and small-scale paraphrase parallel corpora to train the paraphrase generation model iteratively and generate paraphrase pseudo data to enhance the model performance. Furthermore, we propose a pseudo data screening algorithm based on fluency, semantic similarity, and expression diversity to select high-quality paraphrased pseudo data in each round of iterative training of the model. Experimental results on Quora, a public dataset, show that our proposed method exceeds the baseline model in semantic and diversity indicators using only 30% of the paraphrase corpus, which verifies the effectiveness of the proposed method.

Keywords:	low-resource iterative paraphrase generation data enhancement screening algorithm neural networks model encoder-decoder attention mechanism

	点击此处可从《智能系统学报》浏览原始摘要信息
	点击此处可从《智能系统学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏