首页 | 本学科首页   官方微博 | 高级检索  
     

基于SVM和k-NN结合的汉语交集型歧义切分方法
引用本文:李蓉,刘少辉,叶世伟,史忠植.基于SVM和k-NN结合的汉语交集型歧义切分方法[J].中文信息学报,2001,15(6):14-19.
作者姓名:李蓉  刘少辉  叶世伟  史忠植
作者单位:1.中国科技大学研究生院(北京)计算机教学部2.中国科学院计算技术研究所智能开放实验室
摘    要:本文提出了基于支持向量机(SVM)和k-近邻(k-NN)相结合的一种分类方法,用于解决交集型伪歧义字段。首先将交集型伪歧义字段的歧义切分过程形式化为一个分类过程并给出一种歧义字段的表示方法。求解过程是一个有教师学习过程,从歧义字段中挑选出一些高频伪歧义字段,人工将其正确切分并代入SVM训练。对于待识别歧义字段通过使用SVM和k-NN相结合的分类算法即可得到切分结果。实验结果显示使用此方法可以正确处理91.6%的交集歧义字段,而且该算法具有一定的稳定性。

关 键 词:支持向量  类代表点  交集型歧义  汉语自动分词  
修稿时间:2001年7月2日

A Method of Crossing Ambiguities in Chinese Word Segmentation Based on SVM and k-NN
LI Rong,LIU Shao hui,YE Shi wei,SHI Zhong zhi.A Method of Crossing Ambiguities in Chinese Word Segmentation Based on SVM and k-NN[J].Journal of Chinese Information Processing,2001,15(6):14-19.
Authors:LI Rong  LIU Shao hui  YE Shi wei  SHI Zhong zhi
Affiliation:1.Department of Computer ,Graduate School ,Science and Technology University of China2.Institute of Computing Technlolgy ,Chinese Academy of Sciences
Abstract:This paper presents an algorithm based on the combination of Support Vector Maching(SVM)and k Nearest neighbor (k NN),to deal with ambiguities in Chinese word segmentation.We regard the ambiguities segmentation as a classified problem and propose a vector representation of them.The method to find the solutions is supervised learning.After the ambiguities being selected and classified by handwork,the ambiguities with high frequency are trained by SVM.For the testhing ambiguities,we classify it based on mixed classified algorithm.The experiments show that not only the correct rate can reach 91.6%.for crossing ambiguities,but also the performance of this algorithm is of high stability.
Keywords:support vector  representative point  crossing ambiguities  chinese automatic segment
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号