维汉机器翻译未登录词识别研究 Research on out-of-vocabulary words'recognition in Uyghur-Chinese machine translation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

维汉机器翻译未登录词识别研究

引用本文：	米成刚,王　磊,杨雅婷,陈科海.维汉机器翻译未登录词识别研究[J].计算机应用研究,2013,30(4):1112-1115.

作者姓名：	米成刚王　磊杨雅婷陈科海

作者单位：	1. 中国科学院新疆理化技术研究所, 乌鲁木齐 830011; 2. 中国科学院大学, 北京 100049

基金项目：	中国科学院战略性先导科技专项资助项目（XDA06030400）; 中国科学院“西部之光”人才培养计划“西部博士”资助项目（XBBS201216）

摘要：	针对维汉统计机器翻译中未登录词较多的现象和维吾尔语语言资源匮乏这一现状,结合维吾尔语构词特征以及相应的字符串相似度算法,提出了一种基于字符串相似度的维汉机器翻译未登录词识别模型。该模型借助短语表和外部词典,与未翻译的维语词求相似度,取相似度最大短语对应的汉语翻译作为此未登录词的最终翻译。实验证明,与基于词干切分的未登录词识别方法相比,此模型较好地保留了维吾尔语词信息,提高了译文的质量。
关键词：	维汉机器翻译短语表字符串相似度算法未登录词词切分编辑距离
Research on out-of-vocabulary words'recognition in Uyghur-Chinese machine translation

MI Cheng-gang,WANG Lei,YANG Ya-ting,CHEN Ke-hai.Research on out-of-vocabulary words''recognition in Uyghur-Chinese machine translation[J].Application Research of Computers,2013,30(4):1112-1115.

Authors:	MI Cheng-gang WANG Lei YANG Ya-ting CHEN Ke-hai

Affiliation:	1. Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; 2. University of Chinese Academy of Sciences, Beijing 100049, China

Abstract:	Aimed at the phenomenon that there are so many out-of-vocabulary words in Uyghur-Chinese machine translation and the situation that the Uyghur language resources are very scarce, combined the features of Uyghur and string similarity algorithms, the paper presented an out-of-vocabulary word recognition model of Uyghur-Chinese machine translation which based on string similarity algorithms. With the help of phrase based model's phrase table, and the external dictionary, the model computed the maximum strings similarity between the out-of-vocabulary word and the Uyghur words' in phrase table and dictionary, got the translation corresponding to the Uyghur word. The experiments show that compared with the out-of-vocabulary words recognition method which based on word segmentation, this model is better retaining the words' information, and also improves the quality of the translation.

Keywords:	Uyghur-Chinese machine translation phrase table string similarity algorithms out-of-vocabulary words word segmentation edit distance
本文献已被 CNKI 等数据库收录！
	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏