首页 | 本学科首页   官方微博 | 高级检索  
     

面向神经机器翻译系统的多粒度蜕变测试
引用本文:钟文康,葛季栋,陈翔,李传艺,唐泽,骆斌.面向神经机器翻译系统的多粒度蜕变测试[J].软件学报,2021,32(4):1051-1066.
作者姓名:钟文康  葛季栋  陈翔  李传艺  唐泽  骆斌
作者单位:计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023;南通大学 信息科学技术学院, 江苏 南通 226019
基金项目:国家自然科学基金(61802167,61972197,61802095),江苏省自然科学基金(BK20201250)
摘    要:机器翻译是利用计算机将一种自然语言转换成另一种自然语言的任务,是人工智能领域研究的热点问题之一.近年来,随着深度学习的发展,基于序列到序列结构的神经机器翻译模型在多种语言对的翻译任务上都取得了超过统计机器翻译模型的效果,并被广泛应用于商用翻译系统中.虽然商用翻译系统的实际应用效果直观表明了神经机器翻译模型性能有很大提升,但如何系统地评估其翻译质量仍是一项具有挑战性的工作.一方面,若基于参考译文评估翻译效果,其高质量参考译文的获取成本非常高;另一方面,与统计机器翻译模型相比,神经机器翻译模型存在更显著的鲁棒性问题,然而还没有探讨神经机器翻译模型鲁棒性的相关研究.面对上述挑战,本文提出了一种基于蜕变测试的多粒度测试框架,用于在没有参考译文的情况下评估神经机器翻译系统的翻译质量及其翻译鲁棒性.该测试框架首先在句子粒度、短语粒度和单词粒度上分别对源语句进行替换,然后将源语句和替换后语句的翻译结果进行基于编辑距离和成分结构分析树的相似度计算,最后根据相似度判断翻译结果是否满足蜕变关系.本文分别在教育、微博、新闻、口语和字幕等5个领域的中英数据集上对6个主流商用神经机器翻译系统使用不同的蜕变测试框架进行了对比实验.实验结果表明本文提出的方法在与基于参考译文方法的皮尔逊相关系数和斯皮尔曼相关系数上分别比同类型方法高80%和20%,说明本文提出的无参考译文的测试评估方法与基于参考译文的评估方法的正相关性更高,验证了其评估准确性上显著优于同类型其他方法.

关 键 词:神经网络  机器翻译  质量评估  蜕变测试  多粒度
收稿时间:2020/9/12 0:00:00
修稿时间:2020/10/26 0:00:00

Multi-granularity Metamorphic Testing for Neural Machine Translation System
ZHONG Wen-Kang,GE Ji-Dong,CHEN Xiang,LI Chuan-Yi,TANG Ze,LUO Bin.Multi-granularity Metamorphic Testing for Neural Machine Translation System[J].Journal of Software,2021,32(4):1051-1066.
Authors:ZHONG Wen-Kang  GE Ji-Dong  CHEN Xiang  LI Chuan-Yi  TANG Ze  LUO Bin
Affiliation:State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China;School of Information Science and Technology, Nantong University, Nantong 226019, China
Abstract:Machine translation task focuses on converting one natural language into another. In recent years, neural machine translation models based on sequence-to-sequence models have achieved better performance than traditional statistical machine translation models on multiple language pairs, and have been used by many translation service providers. Although the practical application of commercial translation system shows that the neural machine translation model has great improvement, how to systematically evaluate its translation quality is still a challenging task. On the one hand, if the translation effect is evaluated based on the reference text, the acquisition cost of high-quality reference text is very high. On the other hand, compared with the statistical machine translation model, the neural machine translation model has more significant robustness problems. However, there are no relevant studies on the robustness of the neural machine translation model. This paper proposes a multi-granularity test framework MGMT based on metamorphic testing, which can evaluate the roubustness of neural machine translation systems without reference translations. The testing framework first replaces the source sentence on sentence-granularity, phrase-granularity and word-granularity respectively, then compares the translation results of the source sentence and the replaced sentences based on the constituency parse tree, and finally judges whether the result satisfies the metamorphic relationship. We conducted experiments on multi-field Chinese-English translation datasets and evaluates six industrial neural machine translation systems, and compared with same type of metamorphic testing and methods based on reference translations. The experimental results show that our method MGMT is 80% and 20% higher than similar methods in terms of Pearson''s correlation coefficient and Spearman''s correlation coefficient respectively. This indicates that the non-reference translation evaluation method proposed in this paper has a higher positive correlation with the reference translation based evaluation method, which verifies that MGMT''s evaluation accuracy is significantly better than other methods of the same type.
Keywords:neural network  machine translation  quality estimation  metamorphic test  multi-granularity
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号