首页 | 官方网站   微博 | 高级检索  
     

神经机器翻译面对句长敏感问题的研究
引用本文:阿里木·赛买提,斯拉吉艾合麦提·如则麦麦提,麦合甫热提,艾山·吾买尔,吾守尔·斯拉木,吐尔根·依不拉音.神经机器翻译面对句长敏感问题的研究[J].计算机工程与应用,2022,58(9):195-200.
作者姓名:阿里木·赛买提  斯拉吉艾合麦提·如则麦麦提  麦合甫热提  艾山·吾买尔  吾守尔·斯拉木  吐尔根·依不拉音
作者单位:新疆大学 信息科学与工程学院 多语种信息技术实验中心,乌鲁木齐 830046
基金项目:国家自然科学基金;国家语委科研项目;中国新疆维吾尔自治区重点实验室开放基金
摘    要:随着深度学习的发展神经网络机器翻译有了长足的进步。众所周知,神经机器翻译方法对句子长度比较敏感。为了充分利用海量平行语料,考虑平行语料句子长度信息,把原平行语料划分若干个模块,为每一个模块训练一个子模型,提出一种按句子长度融合策略的神经机器翻译方法。当训练结束时,通过句长边界划分后的模型融合与三特征(困惑度、句长比与分类器)融合排序方法得到译文。实验结果表明,提出的方法在三个不同测试集上英中任务中平均提高了1.2左右的BLEU点,维汉任务中提升了0.4至0.6的BLEU点。说明该方法具有一定的参考意义。

关 键 词:机器翻译  极端句长数据  困惑度  融合  深度学习  

Research on Sentence Length Sensitivity in Neural Network Machine Translation
Alim Samat,Sirajahmat Ruzmamat,Maihefureti,Aishan Wumaier,Wushuer Silamu,Turgun Ebrayim.Research on Sentence Length Sensitivity in Neural Network Machine Translation[J].Computer Engineering and Applications,2022,58(9):195-200.
Authors:Alim Samat  Sirajahmat Ruzmamat  Maihefureti  Aishan Wumaier  Wushuer Silamu  Turgun Ebrayim
Affiliation:Laboratory of Multi-Language Information Technology, College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Abstract:With the development of deep learning, neural network machine translation has made considerable progress.It is well known that neuro-machine translation is sensitive to sentence length. In order to make full use of the large number of parallel corpus, this paper divides the original parallel corpus into several modules, trains a sub-model for each module, and proposes a neuro-machine translation method based on sentence length fusion strategy. At the end of the training, the translations are obtained by model fusion and three-feature(confusion, sentence length ratio and classifier) fusion sorting methods after the division of sentence length boundaries. The experimental results show that the BLEU points are increased by about 1.2 in English and Chinese tasks on three different test sets and 0.4 to 0.6 in Uyghur tasks. This method has some reference value.
Keywords:machine translation  extreme sentence length data  perplexity(PPL)  ensemble  deep learning  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号