首页 | 本学科首页   官方微博 | 高级检索  
     

基于混淆网络解码的机器翻译多系统融合
引用本文:杜金华,魏玮,徐波. 基于混淆网络解码的机器翻译多系统融合[J]. 中文信息学报, 2008, 22(4): 48-54
作者姓名:杜金华  魏玮  徐波
作者单位:1. 中国科学院 自动化研究所 数字内容技术研究中心,北京 100190;
2. 中国科学院 自动化研究所 模式识别国家重点实验室, 北京 100190
基金项目:国家高技术研究发展计划(863计划)
摘    要:在对当前几种较流行的统计机器翻译多系统融合方法分析的基础上,提出了一种改进的多系统融合框架,该框架集成了最小贝叶斯风险解码和多特征混淆网络解码两种技术。融合过程如下(1) 从多个翻译系统输出的 -best结果中,利用最小贝叶斯风险解码器选择一个风险最小的假设作为对齐参考;(2) 将其余的 -best假设结果与该参考对齐,从而构建混淆网络。多特征混淆网络基于对数线性模型,引入了更多有效的知识源参与最优路径选择,融合后的BLEU得分比融合前最好的单系统BLEU得分提高了2.19%。在对齐方法上,我们提出了一种改进的翻译错误率(Translation Error Rate, TER)准则——GIZA-TER准则,该准则可以对CN网络进行更有效的短语调序。实验中的显著性检验证明了本文方法的有效性。

关 键 词:人工智能  机器翻译  多系统融合  最小贝叶斯风险解码  多特征混淆网络  GIZA-TER  

Confusion Network Based System Combination for Statistical Machine Translation
DU Jin-hua,WEI Wei,XU Bo. Confusion Network Based System Combination for Statistical Machine Translation[J]. Journal of Chinese Information Processing, 2008, 22(4): 48-54
Authors:DU Jin-hua  WEI Wei  XU Bo
Affiliation:1. Digital Content Technology Research Center, Institute of Automation,
Chinese Academy of Sciences, Beijing 100190, China;
2. National Laboratory of Pattern Recognition, Institute of Automation,
Chinese Academy of Sciences, Beijing 100190, China
Abstract:Based on several popular methods of statistical machine translation combination, an improved multiple-system combination framework is proposed. This framework integrates Minimum Bayes-Risk (MBR) decoding and multi-feature Confusion Network (CN) decoding techniques with the following steps(1)MBR decoding technique is used to select the hypothesis with minimum risk as an alignment reference from several N-best results produced by translation systems ; (2)CN is constructed by aligning the other hypotheses with the reference. Based on log-linear model, the CN introduces more knowledge sources into the selection of optimal path. Compared with the best system without combination, the proposed framework has 2.19% improvement in BLEU score. Inaddition, we present a modified Translation Edit Rate (TER)—GIZA-TER metric for CN alignment, which facilitates a more effective phrase re-ordering. The significance tests demonstrate the validness of the proposed methods.
Keywords:artificial intelligence  machine translation  system combination  minimum bayes-risk decoding  multi-features confusion network  GIZA-TER
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号