首页 | 本学科首页   官方微博 | 高级检索  
     

基于变分信息瓶颈的半监督神经机器翻译
引用本文:于志强,余正涛,黄于欣,郭军军,高盛祥.基于变分信息瓶颈的半监督神经机器翻译[J].自动化学报,2022,48(7):1678-1689.
作者姓名:于志强  余正涛  黄于欣  郭军军  高盛祥
作者单位:1.昆明理工大学信息工程与自动化学院 昆明 650500
基金项目:国家重点研发计划(2019QY1800);;国家自然科学基金(61732005,61672271,61761026,61762056,61866020);;云南省自然科学基金(2018FB104)资助~~;
摘    要:变分方法是机器翻译领域的有效方法, 其性能较依赖于数据量规模. 然而在低资源环境下, 平行语料资源匮乏, 不能满足变分方法对数据量的需求, 因此导致基于变分的模型翻译效果并不理想. 针对该问题, 本文提出基于变分信息瓶颈的半监督神经机器翻译方法, 所提方法的具体思路为: 首先在小规模平行语料的基础上, 通过引入跨层注意力机制充分利用神经网络各层特征信息, 训练得到基础翻译模型; 随后, 利用基础翻译模型, 使用回译方法从单语语料生成含噪声的大规模伪平行语料, 对两种平行语料进行合并形成组合语料, 使其在规模上能够满足变分方法对数据量的需求; 最后, 为了减少组合语料中的噪声, 利用变分信息瓶颈方法在源与目标之间添加中间表征, 通过训练使该表征具有放行重要信息、阻止非重要信息流过的能力, 从而达到去除噪声的效果. 多个数据集上的实验结果表明, 本文所提方法能够显著地提高译文质量, 是一种适用于低资源场景的半监督神经机器翻译方法.

关 键 词:神经机器翻译    跨层注意力机制    回译    变分信息瓶颈
收稿时间:2019-06-24

Improving Semi-supervised Neural Machine Translation With Variational Information Bottleneck
Affiliation:1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 6505002.School of Mathematics and Computer Science, Yunnan Minzu University, Kunming 6505003.Yunnan Key Laboratory of Artificial Intelligence, Kunming 650500
Abstract:Variational approach is effective in the field of machine translation, its performance is highly dependent on the scale of the data. However, in low-resource setting, parallel corpus is limited, which cannot meet the demand of variational approach on data, resulting in suboptimal translation effect. To address this problem, we propose a semi-supervised neural machine translation approach based on variational information bottleneck. The central ideas are as follows: 1) cross-layer attention mechanism is introduced to train the basic translation model; 2) the trained basic translation model is used on the basis of small-scale parallel corpus, then get large-scale noisy pseudo-parallel corpus by back-translation with the input of monolingual corpus. Finally, pseudo-parallel and parallel corpora are merged into combinatorial corpora; 3) variational information bottleneck is used to reduce data noise and eliminate information redundancy in the combinatorial corpus. Experiment results on multiple language pairs show that the model we proposed can effectively improve the quality of translation.
Keywords:
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号