统计与规则并举的汉语词性自动标注算法 Part of Speech Tagging Chinese Corpus Based on Statistics and Rules期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

统计与规则并举的汉语词性自动标注算法

引用本文：	张民,李生,赵铁军,张艳风.统计与规则并举的汉语词性自动标注算法[J].软件学报,1998,9(2):134-138.

作者姓名：	张民李生赵铁军张艳风

作者单位：	哈尔滨工业大学计算机科学与工程系,150001;哈尔滨工业大学计算机科学与工程系,150001;哈尔滨工业大学计算机科学与工程系,150001;哈尔滨工业大学计算机科学与工程系,150001

基金项目：	本文研究得到国家863高科技项目基金资助.

摘要：	本文提出并实现了一种基于定量统计分析优先的统计和规则并举的汉语词性自动标注算法.本算法引入置信区间的概念,优先采用高准确率的定量统计分析技术,然后利用规则标注剩余语料和校正部分统计标注错误.封闭和开放测试表明,在未考虑生词和汉语词错误切分的情况下,本算法的准确率为98.9%和98.1%.
关键词：	汉语词性标注隐马尔可夫模型规则置信区间.
收稿时间：	1996/8/21 0:00:00
修稿时间：	1997/3/20 0:00:00
Part of Speech Tagging Chinese Corpus Based on Statistics and Rules

ZHANG Min,LI Sheng,ZHAO Tie-jun and ZHANG Yan-feng.Part of Speech Tagging Chinese Corpus Based on Statistics and Rules[J].Journal of Software,1998,9(2):134-138.

Authors:	ZHANG Min LI Sheng ZHAO Tie-jun and ZHANG Yan-feng

Affiliation:	Department of Computer Science and Engineering\ Harbin Institute of Technology\ Harbin\ 150001

Abstract:	This paper proposes an algorithm of automaticallytagging the POS(part of speech) of Chinese words which is based on integration of the statistical technique and the rule technique with the priority of the quantitative statistical analysis. The confidence intervals in the estimation of parameters is employed in the algorithm, and this makes the high-accuracy quantitative statistical technique as the top priority of tagging a corpus. Then the untagging part of the corpus is tagged in terms of rules, and some errors by statistics can be corrected by rules. Both closed and opened tests indicated that the accuracies of the algorithm are 98.9% and 98.1% respectively without consideration of both unknown words and segmentation errors.

Keywords:	Chinese part of speech tagging hidden Markov model rule confidence intervals
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏