首页 | 本学科首页   官方微博 | 高级检索  
     

基于双语语料的单个源语词汇和目标语多词单元的对齐
引用本文:陈博兴,杜利民.基于双语语料的单个源语词汇和目标语多词单元的对齐[J].中文信息学报,2003,17(1):13-19.
作者姓名:陈博兴  杜利民
作者单位:中国科学院声学研究所,语音交互技术研究中心
基金项目:国家 973重点基础研究发展资助项目 (G19980 30 5 0 5 )
摘    要:多词单元包括固定搭配、多词习语和多词术语等。本文提供了一个基于双语口语语料库的自动对齐单个源语词汇和目标语多词单元的算法,算法一方面通过计算对应于同一个源语词汇,多个目标语词汇之间的互信息和t值的归一化差值的大小来衡量目标语多个词语之间的关联程度以提取多词单元,另一方面通过计算互信息和t值的平均值作为多词单元和单个源语词汇之间互为相互翻译的衡量程度,用局部最优、首尾禁用词过滤以及长词优先等策略很好地解决了这个问题。另外,对短语翻译词典的分级,有效地减少了高级别词典中非正确翻译项的数目,使得翻译词典具有更好的实用性。

关 键 词:人工智能  机器翻译  双语对齐  多词单元  翻译词典  平均关联值  关联值归一化差值  
文章编号:1003-0077(2003)01-0013-07
修稿时间:2002年5月7日

Alignment of Single Source Words and Target Multi-word Units from Parallel Corpus
CHEN Bo-xing,DU Li-min.Alignment of Single Source Words and Target Multi-word Units from Parallel Corpus[J].Journal of Chinese Information Processing,2003,17(1):13-19.
Authors:CHEN Bo-xing  DU Li-min
Affiliation:Center for Speech Interaction Technology Research ,Institute of Acoustics Chinese Academy of Sciences
Abstract:Multi-word unit includes steady collocation,multi-word phrase and multi-word term,this paper we provide an algorithm for automatic alignment of single source words and target multi-word units from sentence aligned parallel spoken language corpus.Mutual information has been used to extract multi-word units by many other researchers,but the retrieval results mainly depend on the identification of suitable bigrams for the initiation of the iterative process.This algorithm utilizes normalize mutual information difference and normalize t-scores difference between multi target words correspond to the same single source word to extract the multi-word units,then utilizes the even mutual information and even t-score to align the single source words and target multi-word units.In this algorithm,we have applied the Local Bests algorithm,stopword filter and long-length units preference methods et al.The grading of the lexicon can deduce the number of the incorrect entries in the high level lexicon effectively,which makes the translation lexicon more practicably.
Keywords:artificial intelligence  machine translation bilingual alignment  multiword unit  translation dictionary  even association score  normalize association score difference  
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号