首页 | 本学科首页   官方微博 | 高级检索  
     

一种有效的基于Web的双语翻译对获取方法
引用本文:郭稷,吕雅娟,刘群.一种有效的基于Web的双语翻译对获取方法[J].中文信息学报,2008,22(6):103-109.
作者姓名:郭稷  吕雅娟  刘群
作者单位:1. 北京大学 软件与微电子学院,北京 102600;2. 中国科学院 计算技术研究所 智能信息处理重点实验室,北京 100190
摘    要:命名实体和新词、术语的翻译对机器翻译、跨语言检索、自动问答等系统的性能有着重要的影响,但是这些翻译很难从现有的翻译词典中获得。该文提出了一种从中文网页中自动获取高质量双语翻译对的方法。该方法利用网页中双语翻译对的特点,使用统计判别模型,融合多种识别特征自动挖掘网站中存在的双语翻译对。实验结果表明,采用该模型构建的双语翻译词表,TOP1的正确率达到82.1%,TOP3的正确率达到94.5%。文中还提出了一种利用搜索引擎验证候选翻译的方法,经过验证,TOP1的正确率可以提高到84.3%。

关 键 词:计算机应用  中文信息处理  双语翻译对  统计判别模型  网络挖掘  

An Effective Method to Extract Translation Pairs from Web Corpora
GUO ji,LV Ya-juan,LIU Qun.An Effective Method to Extract Translation Pairs from Web Corpora[J].Journal of Chinese Information Processing,2008,22(6):103-109.
Authors:GUO ji  LV Ya-juan  LIU Qun
Affiliation:1. School of Software and Microelectronics, Peking University, Beijing 102600, China;
2. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology,
Chinese Academy of Sciences, Beijing 100190, China
Abstract:The translations of named entities,out of vocabulary words and terms play an important role in many application systems such as machine translation,cross-language information retrieval and question answer.However,these translations are hard to access from traditional bilingual dictionary.This paper proposes a method to automatically extract high quality translation pairs from Chinese web corpora.It analyzes the features of bilingual translation pairs in web pages,and then a statistical discriminative model combined with multiple features is used to extract translation pairs.Experimental results show that the quality of the extracted bilingual translations is improved greatly: Top1 accuracy 82.1%,and Top3 94.5%.The paper also proposes a verification method to further improve the accuracy of the initial extractions with the help of search engines.Top1 accuracy grows up to 84.3% after the verification.
Keywords:computer application  Chinese information processing  bilingual translation pairs  statistical discriminative model  web mining
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号