首页 | 本学科首页   官方微博 | 高级检索  
     

基于句法调序的汉维统计机器翻译
引用本文:陈丽娟,张恒,董兴华,吐尔洪·吾司曼,周俊林.基于句法调序的汉维统计机器翻译[J].计算机工程,2012,38(3):169-171,175.
作者姓名:陈丽娟  张恒  董兴华  吐尔洪·吾司曼  周俊林
作者单位:1. 中国科学院新疆理化技术研究所,乌鲁木齐,830011
2. 中国科学院新疆理化技术研究所,乌鲁木齐830011;中国科学院研究生院,北京100049
3. 中国科学院新疆分院,乌鲁木齐,830011
基金项目:中国科学院西部行动计划高新技术基金资助项目(KGCX2-YN-507)
摘    要:在汉语到维语的统计机器翻译中,2种语言在形态学及语序上差异较大,导致未知词较多,且产生的维语译文语序混乱。针对上述问题,在对汉语和维语的语序进行研究的基础上,提出一种汉语句法调序方法,进而对维语进行形态学分析,采用基于因素的统计机器翻译系统进行验证。实验结果证明,该方法在性能上较基线系统有显著改进,BLEU评分由15.72提高到19.17。

关 键 词:统计机器翻译  句法调序  形态学  因素模型  翻译模型
收稿时间:2011-07-25

Chinese-Uyghur Statistical Machine Translation Based on Syntactical Reordering
CHEN Li-juan , ZHANG Heng , DONG Xing-hua , Turghun Osman , ZHOU Jun-lin.Chinese-Uyghur Statistical Machine Translation Based on Syntactical Reordering[J].Computer Engineering,2012,38(3):169-171,175.
Authors:CHEN Li-juan  ZHANG Heng  DONG Xing-hua  Turghun Osman  ZHOU Jun-lin
Affiliation:1. Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; 2. Graduate University of Chinese Academy of Science, Beijing 100049, China; 3. Xinjiang Branch of Chinese Academy of Sciences, Urumqi 830011, China)
Abstract:Chinese and Uyghur are very different in terms of morphological typology and word order, which leads to many unknown words and confusion word order in Uyghur when translate from Chinese to Uyghur using statistical method. On the basis of the word order of Chinese and Uyghur, a Chinese syntactic reordering method is proposed, and an analysis on Uyghur morphological information is made to resolve the difficulties. Experimental results on the factor-based SMT show that the approach achieves a substantial improvement in translation quality over the baseline phrase-based system, and the BLEU score is improved from 15.72 to 19.17.
Keywords:Statistical Machine Translation(SMT)  syntactical reordering  morphological  factored model  translation model
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号