首页 | 本学科首页   官方微博 | 高级检索  
     

基于英汉平行语料库术语词典的自动抽取
引用本文:梁铭.基于英汉平行语料库术语词典的自动抽取[J].数字社区&智能家居,2009,5(7):5081-5083.
作者姓名:梁铭
作者单位:[1]苏州大学计算机科学与技术学院,苏州江苏215000 [2]苏州工业园区职业技术学院,苏州江苏215021
摘    要:该文提出了一种从英汉平行语料库中自动抽取术语词典的算法。采用的是已对齐好的双语语料,中文经过了分词处理。利用英文和中文词性标注工具对英文语料和中文语料分辨进行词性标注。统计双语语料库中的名词和名词短语生成候选术集。然后对每个英文候选术语计算与其相关的中文翻译间的翻译概率。再通过设定阈值过滤掉一些与该英文候选词无关的中文翻译,最后通过贪心算法选取概率最大的词作为该英文候选词的中文翻译。

关 键 词:术语抽取  平行语料  句子对齐  翻译概率

English-chinese Parallel Corpora Based on the Automatic Extraction of Terms Dictionary
Affiliation:LIANG Mmg (1.School of Computer Science and Technology, Soochow University, Suzhou 215006, China;2. SuZhou Industrial Park Institute of Vocational Technology, Suzhou 215021, China)
Abstract:In the field of natural language processing, the importance of bilingual parallel corpus is increasing. In recent years, many research institutions at home and abroad are building bilingual corpus, and many of the bilingual corpus researchers conducted extensive research. Sentence alignment is an important component of bilingual corpus building, and also the basis work of the machine translation.This paper describes the research background and current situation of the terminology extraction based on the bilingual parallel corpus, and then introduced several ways and basic principles used in the sentence alignment. The bilingual corpus the experiment used is already good alignment, which after a Chinese word processing. Use English and Chinese POS tagging tools to tag the Chinese and English Corpus respectively. The term candidate set is produced by statistical the nouns and noun phrases of both corpus. Then translation probability between every English candidate term and its Chinese translation term are calculated. By setting the threshold to filter out some candidates with the English word unrelated to the Chinese translation, finally, select the greatest probability of the English word as a candidate of the Chinese translation of the word by greedy algorithm.
Keywords:term extraction  parallel corpora  sentence alignment  translation probability
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号