首页 | 本学科首页   官方微博 | 高级检索  
     

基于提高伪平行句对质量的无监督领域适应机器翻译
引用本文:肖妮妮,金畅,段湘煜. 基于提高伪平行句对质量的无监督领域适应机器翻译[J]. 计算机工程与科学, 2022, 44(12): 2230-2237
作者姓名:肖妮妮  金畅  段湘煜
作者单位:(苏州大学计算机科学与技术学院自然语言处理实验室,江苏 苏州 215006)
摘    要:神经机器翻译系统的良好性能依赖于大规模内领域双语平行数据,当特定领域数据稀疏或不存在时,领域适应是个很好的解决办法。无监督领域适应方法通过构建伪平行语料来微调预训练的翻译模型,然而现有的方法没有充分考虑语言的语义、情感等特性,导致目标领域的翻译包含大量的错误和噪声,从而影响到模型的跨领域性能。为缓解这一问题,从模型和数据2个方面来提高伪平行句对的质量,以提高模型的领域自适应能力。首先,提出更加合理的预训练策略来学习外领域数据更为通用的文本表示,增强模型的泛化能力,同时提高内领域的译文准确性;然后,融合句子的情感信息进行后验筛选,进一步改善伪语料的质量。实验表明,该方法在中-英与英-中实验上比强基线系统反向翻译的BLEU值分别提高了1.25和 1.38,可以有效地提高翻译效果。

关 键 词:神经网络  神经机器翻译  领域适应  模型优化  情感信息  
收稿时间:2021-04-26
修稿时间:2021-09-13

Unsupervised domain-adapted machine translation basedon improving the quality of pseudo-parallel sentence pairs
XIAO Ni-ni,JIN Chang,DUAN Xiang-yu. Unsupervised domain-adapted machine translation basedon improving the quality of pseudo-parallel sentence pairs[J]. Computer Engineering & Science, 2022, 44(12): 2230-2237
Authors:XIAO Ni-ni  JIN Chang  DUAN Xiang-yu
Affiliation:(Natural Language Processing Laboratory,School of Computer Science and Technology,Soochow University,Suzhou 215006,China)
Abstract:The good performance of neural machine translation system depends on a large amount of in-domain bilingual parallel data. Domain adaptation is a good solution when the specific domain data is sparse or non-existent. Unsupervised domain adaptation strategies fine-tune the pre-trained translation models by generating pseudo-parallel corpus. However, existing methods do not consider the semantic and emotional characteristics of the languages sufficiently, resulting in a lot of errors and noises in the target domain translation, which affects the cross-domain performance of the model. To alleviate this problem, this paper improves the quality of pseudo-parallel sentence pairs by combining model and data, so as to improve the adaptive ability of the model domain. Firstly, a more reasonable pre-training strategy is proposed to learn more general textual representations of out-domain data, in order to enhance the generalization capability of the model and improve the accuracy of the generated in-domain pseudo- corpus. Then, sentence sentiment features are combined to do posteriori filtering, in order to improve the quality of pseudo-parallel corpus. The experimental results show that, compared with the strong baseline system with back-translation, this method increases the BLEU value by 1.25 and 1.38 respectively in the Chinese-English and English-Chinese translation experiments, thus effectively improving the translation performance.
Keywords:neural network  neural machine translation  domain adaptation  model optimization  sentiment information  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号