首页 | 本学科首页   官方微博 | 高级检索  
     

基于英汉平行语料库术语词典的自动抽取
引用本文:梁铭.基于英汉平行语料库术语词典的自动抽取[J].数字社区&智能家居,2009(19).
作者姓名:梁铭
作者单位:苏州大学计算机科学与技术学院;苏州工业园区职业技术学院;
摘    要:该文提出了一种从英汉平行语料库中自动抽取术语词典的算法。采用的是已对齐好的双语语料,中文经过了分词处理。利用英文和中文词性标注工具对英文语料和中文语料分辨进行词性标注。统计双语语料库中的名词和名词短语生成候选术集。然后对每个英文候选术语计算与其相关的中文翻译间的翻译概率。再通过设定阈值过滤掉一些与该英文候选词无关的中文翻译,最后通过贪心算法选取概率最大的词作为该英文候选词的中文翻译。

关 键 词:术语抽取  平行语料  句子对齐  翻译概率  

English-chinese Parallel Corpora Based on the Automatic Extraction of Terms Dictionary
LIANG Ming.English-chinese Parallel Corpora Based on the Automatic Extraction of Terms Dictionary[J].Digital Community & Smart Home,2009(19).
Authors:LIANG Ming
Affiliation:1.School of Computer Science and Technology;Soochow University;Suzhou 215006;China;2.SuZhou Industrial Park Institute of Vo-cational Technology;Suzhou 215021;China
Abstract:In the field of natural language processing,the importance of bilingual parallel corpus is increasing.In recent years,many research institutions at home and abroad are building bilingual corpus,and many of the bilingual corpus researchers conducted extensive research.Sentence alignment is an important component of bilingual corpus building,and also the basis work of the machine translation.This paper describes the research background and current situation of the terminology extraction based on the bilingual...
Keywords:term extraction  parallel corpora  sentence alignment  translation probability  
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号