首页 | 本学科首页   官方微博 | 高级检索  
     

汉英短语翻译对的自动抽取
引用本文:刘颖,铁铮,余畅.汉英短语翻译对的自动抽取[J].计算机应用与软件,2012,29(7):69-72.
作者姓名:刘颖  铁铮  余畅
作者单位:清华大学中国语言文学系 北京100084
摘    要:介绍从平行语料库中如何抽取双语短语翻译对。首先用统计模型正则期望从汉语专利语料库中抽取汉语短语。抽取的短语利用统计知识和语言学知识来过滤,使得过滤后汉语短语的正确率较高;其次,利用词对齐工具Giza++从汉英平行语料库中抽取词汇对齐,在词汇对齐的基础上利用开源工具Moses抽取汉英短语对齐,根据短语对齐与抽取出的高质量汉语短语的交集来抽取候选的汉英互译的源语言短语;接着使用停用词、对数似然估计法LLR和上下文熵来对英语短语翻译进行过滤。实验结果表明,过滤后,抽取的汉语短语准确率为97.6%,汉英短语翻译对的准确率为92.4%。

关 键 词:抽取  过滤  汉英短语翻译对

AUTOMATIC EXTRACTION OF CHINESE-ENGLISH PHRASE TRANSLATION PAIRS
Liu Ying , Tie Zheng , Yu Chang.AUTOMATIC EXTRACTION OF CHINESE-ENGLISH PHRASE TRANSLATION PAIRS[J].Computer Applications and Software,2012,29(7):69-72.
Authors:Liu Ying  Tie Zheng  Yu Chang
Affiliation:Liu Ying Tie Zheng Yu Chang(Department of Chinese Language and Literature,Tsinghua University,Beijing 100084,China)
Abstract:The thesis studies on how to extract bilingual phrase translation pairs from parallel corpus library.Firstly,it uses statistical model normalized expectation(NE) to extract Chinese phrases from Chinese patent corpus library.The extracted phrases are filtrated by statistical and linguistic knowledge so that the precision rate of filtered Chinese phrases is high.Secondly,it uses Giza+ +,a word alignment tool,to extract words from Chinese-English parallel corpus library and align them;when words alignment completes,it uses Moses,an opensource tool to extract Chinese-English phrases and align them.Based on the interaction of phrasal alignment and extracted high-quality Chinese phrases,it extracts candidate Chinese-English mutual translation source language phrases.Thirdly,it uses words that have been stopped using,LLR and context entropy(CE) to filtrate English phrases translation.Experimental results show that,after extraction,the precision rate of Chinese phrases is 97.6% while that of Chinese-English phrase pairs is 92.4%.
Keywords:Extract Filtrate Chinese-English phrase translation pair
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号