首页 | 本学科首页   官方微博 | 高级检索  
     

基于k-近似的汉语词类自动判定
引用本文:孙茂松,左正平,邹嘉彦.基于k-近似的汉语词类自动判定[J].计算机学报,2000,23(2):166-170.
作者姓名:孙茂松  左正平  邹嘉彦
作者单位:1. 清华大学智能技术与系统国家重点实验室,北京,100084
2. 香港城市大学语言资讯科学研究中心,香港
基金项目:国家自然科学基金!( 6970 5 0 0 5 )
摘    要:生词处理在面向大规模起初文本的自然语言自理各项应用中占有重要位置。词类自动判定就是对说情水知的生词由机器自动赋予一个合适的词类标记。文中提出了一种基于k=近拟的词类自动判定算法,并在一个1亿字汉语语料库及一个60万字经过人工分词和词类标注汉语熟语料库的支持下,构造了相应实验。实验结果初步显示,本算法对汉语开放词类--名词动词开窍词的词类自动判定平均正确率分别为99.21%、84.73%、76.5

关 键 词:词类自动判定  生词处理  自然语言处理  汉语
修稿时间:1999-02-01

Part-of-Speech Identification for Unknown Chinese Words Based on k-Nearest-Neighbors Strategy
SUN Mao-Song,ZUO Zheng-Ping,TSOU B K.Part-of-Speech Identification for Unknown Chinese Words Based on k-Nearest-Neighbors Strategy[J].Chinese Journal of Computers,2000,23(2):166-170.
Authors:SUN Mao-Song  ZUO Zheng-Ping  TSOU B K
Abstract:Unknown word processing plays an important role in many natural language application systems aiming at large scale unrestricted texts. The task of part of speech identification is to automatically assign a part of speech tag to an unknown word with empty part of speech information. A part of speech identification algorithm based on k- nearest neighbors strategy is presented in this paper. The preliminary experiment, supported by a Chinese corpus of 100M characters and a part of speech annotated corpus of 0.6M characters, shows that the average accuracy rates of the algorithm can reach 99.21%, 84.73%, 70.67% for Chinese words of nouns, verbs and adjectives respectively.
Keywords:part  of  speech identification  unknown word processing  Chinese information processing  natural language processing  artificial intelligence
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号