首页 | 本学科首页   官方微博 | 高级检索  
     

利用质量估计改进无监督神经机器翻译
引用本文:徐佳,叶娜,张桂平,黎天宇. 利用质量估计改进无监督神经机器翻译[J]. 中文信息学报, 2021, 35(3): 51-59
作者姓名:徐佳  叶娜  张桂平  黎天宇
作者单位:沈阳航空航天大学 人机智能研究中心,辽宁 沈阳 110136
基金项目:教育部人文社会科学研究青年基金(19YJC740107);国家自然科学基金(U1908216);辽宁省重点研发计划(2019JHZ/10100020)
摘    要:传统上神经机器翻译依赖于大规模双语平行语料,而无监督神经机器翻译的方法避免了神经机器翻译对大量双语平行语料的过度依赖,更适合低资源语言或领域.无监督神经机器翻译训练时会产生伪平行数据,这些伪平行数据质量对机器翻译最终质量起到了决定性的作用.因此,该文提出利用质量估计的无监督神经机器翻译模型,通过在反向翻译的过程中使用质...

关 键 词:无监督神经机器翻译  反向翻译  质量估计
收稿时间:2020-02-07

Improving Unsupervised Neural Machine Translation with Quality Estimation
XU Jia,YE Na,ZHANG Guiping,LI Tianyu. Improving Unsupervised Neural Machine Translation with Quality Estimation[J]. Journal of Chinese Information Processing, 2021, 35(3): 51-59
Authors:XU Jia  YE Na  ZHANG Guiping  LI Tianyu
Affiliation:Human-Computer Intelligence Research Center, Shenyang Aerospace University, Shenyang, Liaoning 110136, China
Abstract:Traditionally, neural machine translation relies on large-scale bilingual parallel corpora. In contrast, unsupervised neural machine translation avoids the dependence on bilingual corpora by generating pseudo-parallel data, whose quality plays a decisive role in the model training. To ensure the final quality of machine translation, we propose an unsupervised neural machine translation model using quality estimation to control the quality of pseudo-parallel data generated. Specifically, in the process of back-translation, we use quality estimation to score the generated pseudo-parallel data, and then select parallel data with higher score (HTER) to train the neural network. Compared with the baseline system, the BLEU scores are increased by 0.79 and 0.55, respectively, on WMT 2019 German-English and Czech-English monolingual news corpora.
Keywords:unsupervised neural machine translation    back-translation    quality estimation  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号