首页 | 本学科首页   官方微博 | 高级检索  
     

汉藏短语抽取
引用本文:诺明花,张立强,刘汇丹,吴健,丁治明.汉藏短语抽取[J].中文信息学报,2011,25(2):105-111.
作者姓名:诺明花  张立强  刘汇丹  吴健  丁治明
作者单位:1. 中国科学院 软件研究所,北京 100190;2. 中国科学院 研究生院,北京 100049
基金项目:中国科学院"西部行动计划高新技术项目"资助
摘    要:该文将从汉藏法律法规和公文领域平行语料中提取双语短语对。考虑现阶段藏文资源匮乏,提出两步汉藏短语抽取方法。第一步是提取汉语有效语块,这部分工作不是该文工作重点。第二步是获取待翻译汉语短语的译文,该模块提出藏文词序列相交算法抽取藏文短语。该算法可以很好的抽取1-1和1-n连续和非连续藏文短语。

关 键 词:汉藏短语抽取  藏文信息处理  中文信息处理  

Chinese Tibetan Phrase Extraction
NUO Minghua,ZHANG Liqiang,LIU Huidan,WU Jian,DING Zhiming.Chinese Tibetan Phrase Extraction[J].Journal of Chinese Information Processing,2011,25(2):105-111.
Authors:NUO Minghua  ZHANG Liqiang  LIU Huidan  WU Jian  DING Zhiming
Affiliation:1. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;
2. Graduate University of the Chinese Academy of Sciences, Beijing 100049, China
Abstract:This paper describes a method to extract phrase pairs from domain-specific Chinese-Tibetan bilingual corpus of laws, regulations and official documents. So far, widely used phrase extraction methods heavily depend on the result of word alignment or additional resources like part-of-speech or syntactic analysis and so forth. Taking account of inadequate resources in Tibetan at present, this paper proposes a two-phase Chinese-Tibetan phrase pairs extraction method. The first step is to extract the Chinese phrase (multi-word chunk) using Nagao's Algorithm and Substring Reduction Algorithm. The second step is to extract the candidate Tibetan translation for translation-ready Chinese phrase. This paper proposes Tibetan words sequence intersection algorithm (TIA) to extract Tibetan phrase. TIA works well on both 1-1 translation and 1-n translation (either continuous or discontinuous) Tibetan phrase.
Key wordsChinese Tibetan phrase extraction; Tibetan information processing; Chinese information processing
Keywords:Chinese Tibetan phrase extraction  Tibetan information processing  Chinese information processing  
本文献已被 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号