汉语词性标注排歧方法探讨 The Dissussion of Disambiguation Method to the Chinese Pos Tagging期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

汉语词性标注排歧方法探讨

引用本文：	王素格,张永奎.汉语词性标注排歧方法探讨[J].计算机工程与应用,2001,37(7):70-72.

作者姓名：	王素格张永奎

作者单位：	山西大学计算机科学系

基金项目：	国家自然科学基金项目!（编号：69575011）,国家"863"项目,（编号：863－306－ZT03－03－1）,山西省自然科学基金项目,（编号：991

摘要：	该文将概率统计的二元模型与三元模型用于汉语词性自动标注,在算法为线性阶的时间复杂度的情况下,对20万训练集和1万的测试集,分别进行封闭测试和开放测试,对稀疏矩阵零元素及词性标注的结果做了统计分析。
关键词：	词性标注同现概率矩阵语料库统计模型
文章编号：	1002-8331-(2001)07-0070-03
修稿时间：	2000年9月1日
The Dissussion of Disambiguation Method to the Chinese Pos Tagging

WANG Suge,ZHANG Yongkui.The Dissussion of Disambiguation Method to the Chinese Pos Tagging[J].Computer Engineering and Applications,2001,37(7):70-72.

Authors:	WANG Suge ZHANG Yongkui

Abstract:	In this paper,the statistic-based bi-grams and tri-grams were used in Chinese part-of-speech tagging. An algorithm which has a time complexity of O (n) was trained on a close corpus of 200,000 characters and then tested on an open test set of 10,000 characters. Finally,the sparse matrix zeros element and the tagging results were statistically analyzed.

Keywords:	： part－of－speech tagging co－concurrency frequency matrix corpus statistic model
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏