首页 | 本学科首页   官方微博 | 高级检索  
     

域信息共享的方法在蒙汉机器翻译中的应用
引用本文:张振,苏依拉,牛向华,高芬,赵亚平,仁庆道尔吉. 域信息共享的方法在蒙汉机器翻译中的应用[J]. 计算机工程与应用, 2020, 56(10): 106-114. DOI: 10.3778/j.issn.1002-8331.1905-0122
作者姓名:张振  苏依拉  牛向华  高芬  赵亚平  仁庆道尔吉
作者单位:内蒙古工业大学 信息工程学院,呼和浩特 010000
基金项目:内蒙古自治区民族事务委员会基金;国家自然科学基金;内蒙古自治区自然科学基金
摘    要:蒙汉翻译属于低资源语言的翻译,面临着平行语料资源稀缺的困难,为了缓解平行语料数据稀缺和词汇表受限引发的翻译正确率低的问题,利用动态的数据预训练方法ELMo(Embeddings from Language Models),并结合多任务域信息共享的Transformer翻译架构进行蒙汉翻译。利用ELMo(深层语境化词表示)进行单语语料的预训练。利用FastText词嵌入算法把蒙汉平行语料库中的上下文语境相关的大规模文本进行预训练。根据多任务共享参数以实现域信息共享的原理,构建了一对多的编码器-解码器模型进行蒙汉神经机器翻译。实验结果表明,该翻译方法比Transformer基线翻译方法在长句子输入序列中可以有效提高翻译质量。

关 键 词:蒙汉翻译  多任务学习  Transformer  ELMo  FastText  

Domain Information Sharing Method in Mongolian-Chinese Machine Translation Application
ZHANG Zhen,SU Yila,NIU Xianghua,GAO Fen,ZHAO Yaping,Ren Qing Daoer Ji. Domain Information Sharing Method in Mongolian-Chinese Machine Translation Application[J]. Computer Engineering and Applications, 2020, 56(10): 106-114. DOI: 10.3778/j.issn.1002-8331.1905-0122
Authors:ZHANG Zhen  SU Yila  NIU Xianghua  GAO Fen  ZHAO Yaping  Ren Qing Daoer Ji
Affiliation:School of Information Engineering, Inner Mongolia University of Technology, Hohhot 010000, China
Abstract:Mongolian-Chinese translation is a translation of low-resource language, facing the difficulty of the scarcity of parallel corpus resources. In order to alleviate the problem of low translation accuracy caused by the scarcity of parallel corpus data and vocabulary limitation, this paper uses dynamic data pre-training method ELMo(Embeddings from Language Models), and combines the Transformer translation architecture for multi-tasking domain information sharing in the Mongolian-Chinese translation. Firstly, ELMo(deep contextualized word representation) is used for the pre-training of the Monolingual corpus. Secondly, this paper uses the FastText word embedding algorithm to pre-train the context-related large-scale text in the Mongolian-Chinese parallel corpus. Then, according to the principle of multi-task sharing parameters to realize domain information sharing, a one-to-many encoder-decoder model is constructed for Mongolian-Chinese neural machine translation. The experimental results show that the translation method can effectively improve the translation quality in the long sentence input sequence than the Transformer baseline translation method.
Keywords:Mongolian-Chinese translation  multi-task learning  Transformer  ELMo  FastText  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号