首页 | 本学科首页   官方微博 | 高级检索  
     

融入分类词典的汉越混合网络神经机器翻译集外词处理方法
引用本文:车万金,余正涛,郭军军,文永华,于志强.融入分类词典的汉越混合网络神经机器翻译集外词处理方法[J].中文信息学报,2019,33(12):67-75.
作者姓名:车万金  余正涛  郭军军  文永华  于志强
作者单位:1.昆明理工大学 信息工程与自动化学院,云南 昆明 650500;
2.昆明理工大学 云南省人工智能重点实验室,云南 昆明 650500
基金项目:国家重点研发计划(2018YFC0830105,2018YFC0830100);国家自然科学基金(61732005,61672271,61761026,61762056,61866020);云南省高新技术产业专项(201606);云南省自然科学基金(2018FB104);云南省科技人才培养项目(KKSY201703015)
摘    要:在神经机器翻译中,因词表受限导致的集外词问题很大程度上影响了翻译系统的准确性。对于训练语料较少的资源稀缺型语言的神经机器翻译,这种问题表现得更为严重。近几年,受到外部知识融入的启发,该文在RNNSearch模型基础上,提出了一种融入分类词典的汉越混合网络神经机器翻译集外词处理方法。对于给定的源语言句子,扫描分类词典以确定候选短语句对并标签标记,解码端利用词级组件和短语组件的混合解码网络,很好地生成单词集外词和短语集外词的翻译,从而改善汉越神经机器翻译的性能。在汉越、英越和蒙汉翻译实验上表明,该方法显著提高了准确率,对于资源稀缺型语言的神经机器翻译性能有一定的提升。

关 键 词:神经机器翻译  分类词典  资源稀缺  集外词  

Unknown Words Processing Method for Chinese-Vietnamese Neural Machine Translation Based on Hybrid Network Integrating Classification Dictionaries
CHE Wanjin,YU Zhengtao,GUO Junjun,WEN Yonghua,YU Zhiqiang.Unknown Words Processing Method for Chinese-Vietnamese Neural Machine Translation Based on Hybrid Network Integrating Classification Dictionaries[J].Journal of Chinese Information Processing,2019,33(12):67-75.
Authors:CHE Wanjin  YU Zhengtao  GUO Junjun  WEN Yonghua  YU Zhiqiang
Affiliation:1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, China;
2.Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan 650500, China
Abstract:In neural machine translation, the problem of unknown words caused by limited vocabulary significantly affects the translation quality. Inspired by the integration of external knowledge, this paper investigates to improve the RNNSearch NMT by incorporating the classification dictionary, and proposes a new hybrid network to deal with the unknown words problem in the Chinese-Vietnamese neural machine translation. For source language sentence, the model scans classification dictionary to determine candidate phrase pairs and tags, the decoder uses hybrid network with both word and phrase level components to generate the translations. Experiments on Chinese-Vietnamese, English-Vietnamese and Mongolian-Chinese NMT show that this method significantly improves the translation performance.
Keywords:neural machine translation  classification dictionaries  resource-scarce  unknown words  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号