首页 | 本学科首页   官方微博 | 高级检索  
     

基于虚拟上下文的统计机器翻译短语表的过滤
引用本文:殷 乐,张玉洁,徐金安. 基于虚拟上下文的统计机器翻译短语表的过滤[J]. 中文信息学报, 2013, 27(6): 139-144
作者姓名:殷 乐  张玉洁  徐金安
作者单位:北京交通大学 计算机学院, 北京 100044
基金项目:北京交通大学人才基金资助项目(KKRC11001532)
摘    要:在基于短语的统计机器翻译系统中,自动抽取的短语表中不可避免的包含大量的冗余和错误的短语对,这浪费了解码资源又影响翻译质量。为了缓解这个问题,该文提出一种基于虚拟上下文的过滤短语表的方法。该方法引入虚拟上下文计算短语对的得分增量;并通过计算最大和最小的短语对的得分增量,设计了一种对短语对重排序的过滤策略。我们在NTCIR-9的中英数据上进行了验证实验,结果显示,当短语表的规模下降到原来的47%时,翻译质量的BLEU值提高了0.000 5;当短语表的规模下降到原来的30%时,BLEU值仅下降0.000 6。实验结果表明,在大规模短语表的过滤中,该文的方法是有效可行的。

关 键 词:基于短语的统计机器翻译  短语表过滤  虚拟上下文  

Phrase Table Filtration Based on Virtual Context in Phrased-Based Statistical Machine Translation
YIN Yue,ZHANG Yujie,XU Jinan. Phrase Table Filtration Based on Virtual Context in Phrased-Based Statistical Machine Translation[J]. Journal of Chinese Information Processing, 2013, 27(6): 139-144
Authors:YIN Yue  ZHANG Yujie  XU Jinan
Affiliation:School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044,China
Abstract:In statistical machine translation system, automatically extracted phrase table inevitably contains a large number of errors and redundant phrase pairs, which causes excessive waste of time and space in decoding and affects translation quality. In order to solve this problem, we propose a method for filtering phrase table in which virtual context is introduced to calculate an incremental quantity in score of phrase pair from language model. By considering the maximum and minimum incremental quantity in score from the virtual context, we design a filtering strategy by re-ranking phrase pairs. We conducted experiments on NTCIR-9 Chinese-English data to verify the method. The experimental results show that when the size of phrase table was reduced to 47% of the original, the translation quality was improved slightly; when the size was reduced to 30% of the original, only slight decline occurred in translation quality. The experimental results indicate that this method can effectively filter out the redundant phrase pairs of the phrase table.
Key wordsphrase-based statistical machine translation, filter phrase table, virtual context
Keywords:
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号