首页 | 本学科首页   官方微博 | 高级检索  
     

一种应用向量聚合技术的KNN中文文本分类方法
引用本文:李莹,张晓辉,王华勇,常桂然.一种应用向量聚合技术的KNN中文文本分类方法[J].小型微型计算机系统,2004,25(6):993-996.
作者姓名:李莹  张晓辉  王华勇  常桂然
作者单位:东北大学,计算中心,辽宁,沈阳,110004
基金项目:国家“8 63”高技术计划资助项目 ( 863 -3 0 6-ZD0 2 -0 2 -6)资助
摘    要:针对KNN文本分类方法中不考虑特征词关联的问题,提出一种改进方法.这种方法基于对体现词和类别问相关程度的CHI统计值分布的分析,应用向量聚合技术很好地解决了关联特征词的提取问题.其特点在于:聚合文本向量中相关联的特征词作为特征项,从而取代传统方法中一个特征词对应向量一维的做法,这样不但缩减了向量的维教,而且加强了特征项对文本分类的贡献.实验表明该方法明显提高了分类的准确率和召回率。

关 键 词:KNN  中文文本分类  向量聚合
文章编号:1000-1220(2004)06-0993-04

Vector-Combination-Applied KNN Method for Chinese Text Categorization
LI Ying,ZHANG Xiao-Hui,WANG Hua-Yong,CHANG Gui-Ran.Vector-Combination-Applied KNN Method for Chinese Text Categorization[J].Mini-micro Systems,2004,25(6):993-996.
Authors:LI Ying  ZHANG Xiao-Hui  WANG Hua-Yong  CHANG Gui-Ran
Abstract:On account of the fact that the traditional method lacks for the consideration of words association, this paper proposes an improved KNN (k-Nearest Neighbor) method for Chinese Text Categorization. This method applies vector-combination technology to extract the associated discriminating words according to the CHI statistic distribution, which indicates the relationship between words and classes. One of the merits of this method is to combine the associated discriminating words to be one feature and abandon the traditional method -- one word per dimension. It not only decreases the dimensions of the text vector, but also strengthens the contribution to categorization of each feature. The experiment shows that this improvement augments the categorization recall and precision obviously.
Keywords:KNN  Chinese text categorization  vector combination
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号