汉语语料词性标注自动校对方法的研究 Research on the Method of Automatic Correction of Chinese Part-of-Speech Tagging期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

汉语语料词性标注自动校对方法的研究

引用本文：	钱揖丽,郑家恒.汉语语料词性标注自动校对方法的研究[J].中文信息学报,2004,18(2):31-36.

作者姓名：	钱揖丽郑家恒

作者单位：	山西大学计算机科学系

基金项目：	国家高技术研究发展计划(863计划)

摘要：	兼类词的词类排歧是汉语语料词性标注中的难点问题,它严重影响语料的词性标注质量。针对这一难点问题,本文提出了一种兼类词词性标注的自动校对方法。它利用数据挖掘的方法从正确标注的训练语料中挖掘获取有效信息,自动生成兼类词词性校对规则,并应用获取的规则实现对机器初始标注语料的自动校对,从而提高语料中兼类词的词性标注质量。分别对50万汉语语料做封闭测试和开放测试,结果显示,校对后语料的兼类词词性标注正确率分别可提高11.32%和5.97%。
关键词：	计算机应用中文信息处理兼类词汉语词性标注自动校对粗糙集
文章编号：	1003-0077(2004)02-0030-06
修稿时间：	2003年8月6日
Research on the Method of Automatic Correction of Chinese Part-of-Speech Tagging

QIAN Yi-li,ZHENG Jia-heng.Research on the Method of Automatic Correction of Chinese Part-of-Speech Tagging[J].Journal of Chinese Information Processing,2004,18(2):31-36.

Authors:	QIAN Yi-li ZHENG Jia-heng

Affiliation:	The Department of Computer Science , Shanxi University

Abstract:	The disambiguation of multi-category words is one of the difficulties in part-of-speech tagging of Chinese text, which affects the processing quality of corpora greatly. Aiming at this question, the paper describes an approach to correcting the part-of-speech tagging of multi-category words automatically. It acquires correction rules for the part-of-speech tagging of multi-category words from right-tagged corpora based on the rough sets and data mining, and then corrects the corpora based on these rules automatically. According to the results of close-test and open-test on the corpus of 500,000 Chinese characters, the accuracy of multi-category words'part-of-speech tagging can be increased by 11.32% and 5.97% respectively.

Keywords:	computer application Chinese information processing multi-category word Chinese part-of-speech tagging automatic correction rough sets
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏