首页 | 本学科首页   官方微博 | 高级检索  
     

基于双语模型的汉语句法分析知识自动获取
引用本文:吕雅娟,李生,赵铁军.基于双语模型的汉语句法分析知识自动获取[J].计算机学报,2003,26(1):32-38.
作者姓名:吕雅娟  李生  赵铁军
作者单位:哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001
基金项目:国家“八六三”高技术研究发展计划项目 ( 2 0 0 1AA114 10 1),微软 哈尔滨工业大学机器翻译联合实验室合作项目资助
摘    要:提出了一种汉语句法分析知识自动获取的新方法。该方法以双语语料库为基础,在双语语言模型的指导下,利用英语句法分析和双语词汇对齐得到汉语句子分析结果。根据得到的句子分析可以提取汉语组块边界信息和简单的句法分析规则。实验结果表明,自动获取的组块分析边界和已有的汉语句法分析体系取得了很好的一致性,证明了该方法的可行性和有效性。文中提出的方法充分利用现有的英语地研究成果,为汉语句法分析研究提出了一个崭新的思路。

关 键 词:双语模型  汉语句法分析  知识自动获取  自然语言处理  知识获取  双语语料库
修稿时间:2002年1月21日

Automatically Acquiring Chinese Parsing Knowledge Based on a Bilingual Language Model
LU Ya,Juan,LI Sheng,ZHAO Tie,Jun.Automatically Acquiring Chinese Parsing Knowledge Based on a Bilingual Language Model[J].Chinese Journal of Computers,2003,26(1):32-38.
Authors:LU Ya  Juan  LI Sheng  ZHAO Tie  Jun
Abstract:Knowledge acquisition is a bottleneck for real application of Chinese parsing. This paper presents a new method to acquire Chinese parsing knowledge from sentence aligned English Chinese bilingual corpora. Using English parsing and word alignment results, this method first implements bilingual structure alignment based on a bilingual language model-Inversion Transduction Grammars. Then, Chinese bracketing structures are extracted automatically. The method creates structure bracketing Chinese corpora by taking full advantage of English parsing and bilingual corpora. The created corpora are very useful for further Chinese corpus annotation and parsing knowledge acquisition. Preliminary experiments show that the acquired knowledge accord well with manually made knowledge. This method is particularly useful to acquire parsing knowledge for a language lacking of studied from a second language that well studied. Although this paper is related to Chinese and English, the proposed method is also applicable to other language pairs.
Keywords:parsing  knowledge acquisition  bilingual language model  bilingual corpus
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号