首页 | 本学科首页   官方微博 | 高级检索  
     

句对齐研究综述
引用本文:黄佳跃,熊德意.句对齐研究综述[J].中文信息学报,2021,35(8):16-27.
作者姓名:黄佳跃  熊德意
作者单位:苏州大学 计算机科学与技术学院,江苏 苏州 215006
基金项目:国家自然科学优秀青年基金(61622209)
摘    要:神经机器翻译是目前机器翻译领域的主流方法,拥有足够数量的双语平行语料是训练出一个好的翻译模型的前提。双语句对齐技术作为一种从不同语言端单语语料中获取双语平行句对的技术,因此得到广泛的研究。该文首先简单介绍句对齐任务及其相应的评测标准,然后归纳总结前人在句对齐任务上的研究进展,以及句对齐任务的相关信息,并简单概括参加团队所提交的系统,最后对当前工作进行总结并展望未来的工作。

关 键 词:神经机器翻译  句对齐  
收稿时间:2020-03-05

A Survey of Sentence Alignment
HUANG Jiayue,XIONG Deyi.A Survey of Sentence Alignment[J].Journal of Chinese Information Processing,2021,35(8):16-27.
Authors:HUANG Jiayue  XIONG Deyi
Affiliation:School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
Abstract:Neural machine translation has achieved good translation results on languages with abundant corpus, but it has poor performance on languages with scarce bilingual corpus resources such as Chinese-Vietnamese, this problem can be better alleviated by generating pseudo-parallel sentence pairs through word-level replacement of existing small-scale bilingual data. Considering the problem of multiple translations of one word in Chinese-Vietnamese word-level substitutions, so we studied the replacement based on larger granularity, and proposed the Chinese-Vietnamese pseudo-parallel sentence pair generation method based on phrase substitution. Use small-scale bilingual data for phrase extraction to construct a phrase alignment table, and expand it with entity phrases extracted from Wikipedia, after performing phrase recognition on bilingual data for Chinese and Vietnamese, use the phrase pair in the phrase alignment table that is more similar to the recognized phrase to replace, to achieve the phrase-level data enhancement, and train the final neural machine translation model together with the generated pseudo-parallel sentence pairs and the original data. Experimental results on Chinese-Vietnamese translation tasks show that pseudo-parallel sentence pairs generated by phrase substitution can effectively improve the performance of Chinese-Vietnamese neural machine translation.
Keywords:neural machine translation  sentence alignment  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号