首页 | 本学科首页   官方微博 | 高级检索  
     

面向口语翻译的双语语块自动识别
引用本文:程葳,赵军,刘非凡,徐波.面向口语翻译的双语语块自动识别[J].计算机学报,2004,27(8):1016-1020.
作者姓名:程葳  赵军  刘非凡  徐波
作者单位:1. 中国科学院自动化研究所模式识别国家重点实验室,北京,100080;北京城市学院人工智能研究所,北京,100083
2. 中国科学院自动化研究所模式识别国家重点实验室,北京,100080
基金项目:国家自然科学基金 ( 60 2 72 0 41,60 12 13 0 2 )资助
摘    要:语块识别是实现“基于语块处理方法”的基础 .目前 ,针对单语语块的研究成果已有很多 ,但机器翻译更需要双语相关的语块分析 .该文根据口语翻译的实际需要 ,提出了“双语语块”的概念 .并在此基础上 ,实现了一种针对并行语料库进行双语语块自动识别的新方法 .该方法将统计和规则相结合 ,可同时保证双语语块的语义特性和句法规范 .通过在一个 6万句的旅馆预定领域口语语料库中的实验可以看出 ,该方法对汉英并行语料的双语语块识别正确率可达到 80 %左右 .

关 键 词:语块  语块分析  语料库  口语翻译

Automatic Identification of Co -Chunks for Spoken-language Translation
CHENG Wei , ZHAO Jun LIU Fei-Fan XU Bo.Automatic Identification of Co -Chunks for Spoken-language Translation[J].Chinese Journal of Computers,2004,27(8):1016-1020.
Authors:CHENG Wei  ZHAO Jun LIU Fei-Fan XU Bo
Affiliation:CHENG Wei 1),2) ZHAO Jun 1) LIU Fei-Fan 1) XU Bo 1) 1)
Abstract:Chunk parsing is a basic step for the chunk-based processing. There have been many chunk parsing methods for single languages. However chunk parsing for bilingual language is specially needed in the machine translation. The paper presents the idea of co-chunks which are defined according to the characteristics of both Chinese and English. A new algorithm is also proposed to automatically identify the co-chunks in the sentence-aligned bilingual corpus. It combines rules into statistical model, which assure that the co-chunks identified have both legal syntactical structure and semantical explanation. The algorithm is trained in a sentence-aligned Chinese-English bilingual corpus with the size of about sixty thousand sentence pairs. This corpus consists of spontaneous utterances from hotel reservation dialogs. The experiments show that the accuracy of the method is above 80%.
Keywords:chunk  chunk parsing  corpora  spoken-language translation
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号