基于k-近似的汉语词类自动判定 Part-of-Speech Identification for Unknown Chinese Words Based on k-Nearest-Neighbors Strategy期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于k-近似的汉语词类自动判定

引用本文：	孙茂松,左正平,邹嘉彦.基于k-近似的汉语词类自动判定[J].计算机学报,2000,23(2):166-170.

作者姓名：	孙茂松左正平邹嘉彦

作者单位：	1. 清华大学智能技术与系统国家重点实验室,北京,100084 2. 香港城市大学语言资讯科学研究中心,香港

基金项目：	国家自然科学基金!( 6970 5 0 0 5 )

摘要：	生词处理在面向大规模起初文本的自然语言自理各项应用中占有重要位置。词类自动判定就是对说情水知的生词由机器自动赋予一个合适的词类标记。文中提出了一种基于ｋ＝近拟的词类自动判定算法，并在一个１亿字汉语语料库及一个６０万字经过人工分词和词类标注汉语熟语料库的支持下，构造了相应实验。实验结果初步显示，本算法对汉语开放词类－－名词动词开窍词的词类自动判定平均正确率分别为９９．２１％、８４．７３％、７６．５
关键词：	词类自动判定生词处理自然语言处理汉语
修稿时间：	1999-02-01
Part-of-Speech Identification for Unknown Chinese Words Based on k-Nearest-Neighbors Strategy

SUN Mao-Song,ZUO Zheng-Ping,TSOU B K.Part-of-Speech Identification for Unknown Chinese Words Based on k-Nearest-Neighbors Strategy[J].Chinese Journal of Computers,2000,23(2):166-170.

Authors:	SUN Mao-Song ZUO Zheng-Ping TSOU B K

Abstract:	Unknown word processing plays an important role in many natural language application systems aiming at large scale unrestricted texts. The task of part of speech identification is to automatically assign a part of speech tag to an unknown word with empty part of speech information. A part of speech identification algorithm based on k- nearest neighbors strategy is presented in this paper. The preliminary experiment, supported by a Chinese corpus of 100M characters and a part of speech annotated corpus of 0.6M characters, shows that the average accuracy rates of the algorithm can reach 99.21%, 84.73%, 70.67% for Chinese words of nouns, verbs and adjectives respectively.

Keywords:	part of speech identification unknown word processing Chinese information processing natural language processing artificial intelligence
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏