首页 | 本学科首页   官方微博 | 高级检索  
     

中文机构名称的识别与分析
引用本文:张小衡,王玲玲.中文机构名称的识别与分析[J].中文信息学报,1997,11(4):22-33.
作者姓名:张小衡  王玲玲
作者单位:香港理工大学中文及双语学系
摘    要:中文机构名称数目庞大, 层出不穷, 绝大多数未能收入词典, 给自然语言处理带来困扰。但是, 从语言学的角度来看, 机构名称是一种偏正复合式专有名词, 同时又是一类较为简单的偏正名词词组, 有自己的结构规律和形态标记。本文以高校名称为重点,以中国内地、香港和台湾三地实际语料为依据, 从语言学和计算机技术两方面对机构名称的识别与分析展开讨论, 并总结出相应的规则。根据这些规则, 对六百多万字的三地语料库作高校名称识别, 正确率(指前后界定位均正确) 达97.3 % , 召回率为96.9 %。这些规则还可应用于拼音-汉字智能转换和机器翻译等其它领域。

关 键 词:机构名称  专有名词  短语分析  自然语言处理  

Identification and Analysis of Chinese Organization and Institution Names
Zhang Xiaoheng,Wang Lingling.Identification and Analysis of Chinese Organization and Institution Names[J].Journal of Chinese Information Processing,1997,11(4):22-33.
Authors:Zhang Xiaoheng  Wang Lingling
Affiliation:Department of Chinese and Bilingual Studies , The Hong Kong Polytechnic University
Abstract:As important proper nouns , Chinese names of organizations and institutions play an in-dispensable role in language communication. Unfortunately , due to their infinite quantity , constant creation and disappearance , and relative length and complexity , most of these names have failed to find their way into Chinese dictionaries of computer systems. Linguistically , however , these proper nouns can be viewed as a special group of compound nouns and as a simple category of noun phrase , possessing their own formation rules and physical markers. This paper presents a pioneer discussion on the analysis of Chinese names of organizations and institutions from the computational point of view. Useful linguistic rules has been drawn from the discussion and applied to the identification of names of organizations and institutions in the 6,000,000-character Mainland-Hongkong-Taiwan corpus of modern Chinese developed by Hong Kong Polytechnic University. Preliminary experiments show that both precision and recall rates for identifying names of colleges and universities are over 96%.
Keywords::Organization and institution names  Proper nouns  Phrase analysis  Natural language processing  
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号