基于SVM和k-NN结合的汉语交集型歧义切分方法 A Method of Crossing Ambiguities in Chinese Word Segmentation Based on SVM and k-NN期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于SVM和k-NN结合的汉语交集型歧义切分方法

引用本文：	李蓉,刘少辉,叶世伟,史忠植.基于SVM和k-NN结合的汉语交集型歧义切分方法[J].中文信息学报,2001,15(6):14-19.

作者姓名：	李蓉刘少辉叶世伟史忠植

作者单位：	1.中国科技大学研究生院(北京)计算机教学部2.中国科学院计算技术研究所智能开放实验室

摘要：	本文提出了基于支持向量机(SVM)和k-近邻(k-NN)相结合的一种分类方法,用于解决交集型伪歧义字段。首先将交集型伪歧义字段的歧义切分过程形式化为一个分类过程并给出一种歧义字段的表示方法。求解过程是一个有教师学习过程,从歧义字段中挑选出一些高频伪歧义字段,人工将其正确切分并代入SVM训练。对于待识别歧义字段通过使用SVM和k-NN相结合的分类算法即可得到切分结果。实验结果显示使用此方法可以正确处理91.6%的交集歧义字段,而且该算法具有一定的稳定性。
关键词：	支持向量类代表点交集型歧义汉语自动分词
修稿时间：	2001年7月2日
A Method of Crossing Ambiguities in Chinese Word Segmentation Based on SVM and k-NN

LI Rong,LIU Shao hui,YE Shi wei,SHI Zhong zhi.A Method of Crossing Ambiguities in Chinese Word Segmentation Based on SVM and k-NN[J].Journal of Chinese Information Processing,2001,15(6):14-19.

Authors:	LI Rong LIU Shao hui YE Shi wei SHI Zhong zhi

Affiliation:	1.Department of Computer ,Graduate School ,Science and Technology University of China2.Institute of Computing Technlolgy ,Chinese Academy of Sciences

Abstract:	This paper presents an algorithm based on the combination of Support Vector Maching(SVM)and k Nearest neighbor (k NN),to deal with ambiguities in Chinese word segmentation.We regard the ambiguities segmentation as a classified problem and propose a vector representation of them.The method to find the solutions is supervised learning.After the ambiguities being selected and classified by handwork,the ambiguities with high frequency are trained by SVM.For the testhing ambiguities,we classify it based on mixed classified algorithm.The experiments show that not only the correct rate can reach 91.6%.for crossing ambiguities,but also the performance of this algorithm is of high stability.

Keywords:	support vector representative point crossing ambiguities chinese automatic segment
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏