首页 | 本学科首页   官方微博 | 高级检索  
     

基于子词的历史典籍术语对齐方法
引用本文:车超,郑晓军.基于子词的历史典籍术语对齐方法[J].中文信息学报,2016,30(3):46-51.
作者姓名:车超  郑晓军
作者单位:1. 大连大学 先进设计与智能计算省部共建教育部重点实验室,辽宁 大连 116024;
2. 大连交通大学 机械工程学院,辽宁 大连 116028
基金项目:国家自然科学基金(61402068,61304206)
摘    要:由于历史典籍术语存在普遍的多义性且缺少古汉语分词算法,使用基于双语平行语料的对齐方法来自动获取典籍术语翻译对困难重重。针对上述问题,该文提出一种基于子词的最大熵模型来进行典籍术语对齐。该方法结合两种统计信息抽取频繁在一起出现的字作为子词,使用子词对典籍进行分词,解决了缺少古汉语分词算法的问题。针对典籍术语的多义性,根据典籍术语的音译模式制定音译特征函数,并结合其他特征使用最大熵模型来确定术语的翻译。在《史记》双语平行语料上的实验表明,使用子词的方法远远优于未使用子词的方法,而结合三种特征的最大熵模型能有效的提高术语对齐的准确率。


关 键 词:子词  术语对齐  最大熵模型  音译特征
  

Sub-Word Based Translation Extraction for Terms in Chinese Historical Classics
CHE Chao,ZHENG Xiaojun.Sub-Word Based Translation Extraction for Terms in Chinese Historical Classics[J].Journal of Chinese Information Processing,2016,30(3):46-51.
Authors:CHE Chao  ZHENG Xiaojun
Affiliation:1. Key Laboratory of Advanced Design and Intelligent ComputingMinistry of Education,
Dalian University,Dalian, Liaoning 116024,China;
2. School of Mechanical Engineering,Dalian Jiaotong University,Dalian,Liaoning 116028,China
Abstract:It is difficult to extract term translation pairs from the parallel corpus of historical classics due to lack of proper word segmentation for ancient Chinese. In this paper we introduce a term alignment method using maximum entropy model based on sub-words. In our approach,we first extract word pairs as sub-words by chi-square statistics and log-likelihood ratio test, and apply them to segment Chinese. Then we build transliteration features according to the transliteration model of classics terms, and perform term alignment through maximum entropy. The use of sub-words addresses the lack of word segmentation method for ancient Chinese and the maximum entropy model integrating three kinds of features deals with the polysemy of terms. The experiments on the parallel corpora of Shi Ji show the effectiveness of the sub-words by a large improvement in performance compared to the IBM Model 4.
Keywords:sub words  term alignment  maximum entropy model  transliteration  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号