域信息共享的方法在蒙汉机器翻译中的应用 Domain Information Sharing Method in Mongolian-Chinese Machine Translation Application期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

域信息共享的方法在蒙汉机器翻译中的应用

引用本文：	张振,苏依拉,牛向华,高芬,赵亚平,仁庆道尔吉. 域信息共享的方法在蒙汉机器翻译中的应用[J]. 计算机工程与应用, 2020, 56(10): 106-114. DOI: 10.3778/j.issn.1002-8331.1905-0122

作者姓名：	张振苏依拉牛向华高芬赵亚平仁庆道尔吉

作者单位：	内蒙古工业大学信息工程学院，呼和浩特 010000

基金项目：	内蒙古自治区民族事务委员会基金;国家自然科学基金;内蒙古自治区自然科学基金

摘要：	蒙汉翻译属于低资源语言的翻译，面临着平行语料资源稀缺的困难，为了缓解平行语料数据稀缺和词汇表受限引发的翻译正确率低的问题，利用动态的数据预训练方法ELMo（Embeddings from Language Models），并结合多任务域信息共享的Transformer翻译架构进行蒙汉翻译。利用ELMo（深层语境化词表示）进行单语语料的预训练。利用FastText词嵌入算法把蒙汉平行语料库中的上下文语境相关的大规模文本进行预训练。根据多任务共享参数以实现域信息共享的原理，构建了一对多的编码器-解码器模型进行蒙汉神经机器翻译。实验结果表明，该翻译方法比Transformer基线翻译方法在长句子输入序列中可以有效提高翻译质量。
关键词：	蒙汉翻译多任务学习 Transformer ELMo FastText
Domain Information Sharing Method in Mongolian-Chinese Machine Translation Application

ZHANG Zhen,SU Yila,NIU Xianghua,GAO Fen,ZHAO Yaping,Ren Qing Daoer Ji. Domain Information Sharing Method in Mongolian-Chinese Machine Translation Application[J]. Computer Engineering and Applications, 2020, 56(10): 106-114. DOI: 10.3778/j.issn.1002-8331.1905-0122

Authors:	ZHANG Zhen SU Yila NIU Xianghua GAO Fen ZHAO Yaping Ren Qing Daoer Ji

Affiliation:	School of Information Engineering, Inner Mongolia University of Technology, Hohhot 010000, China

Abstract:	Mongolian-Chinese translation is a translation of low-resource language, facing the difficulty of the scarcity of parallel corpus resources. In order to alleviate the problem of low translation accuracy caused by the scarcity of parallel corpus data and vocabulary limitation, this paper uses dynamic data pre-training method ELMo（Embeddings from Language Models）, and combines the Transformer translation architecture for multi-tasking domain information sharing in the Mongolian-Chinese translation. Firstly, ELMo（deep contextualized word representation） is used for the pre-training of the Monolingual corpus. Secondly, this paper uses the FastText word embedding algorithm to pre-train the context-related large-scale text in the Mongolian-Chinese parallel corpus. Then, according to the principle of multi-task sharing parameters to realize domain information sharing, a one-to-many encoder-decoder model is constructed for Mongolian-Chinese neural machine translation. The experimental results show that the translation method can effectively improve the translation quality in the long sentence input sequence than the Transformer baseline translation method.

Keywords:	Mongolian-Chinese translation multi-task learning Transformer ELMo FastText
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏