首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于增量学习型矢量量化的有效文本分类算法
引用本文:王修君,沈鸿.一种基于增量学习型矢量量化的有效文本分类算法[J].计算机学报,2007,30(8):1277-1285.
作者姓名:王修君  沈鸿
作者单位:中国科学技术大学计算机科学与技术系 合肥230039
基金项目:感谢谭营教授对本文所提出的宝贵意见和建议
摘    要:KNN作为一种简单的分类方法在文本分类中有广泛的应用,但存在着计算量大和训练文档分布不均所造成的分类准确率下降等同题.针对这些问题,基于最小化学习误差的增量思想,该文将学习型矢量量化(LVQ)和生长型神经气(GNG)结合起来提出一种新的增量学习型矢量量化方法,并将其应用到文本分类中.文中提出的算法对所有的训练样本有选择性地进行一次训练就可以生成有效的代表样本集,具有较强的学习能力.实验结果表明:这种方法不仅可以降低KNN方法的测试时间,而且可以保持甚至提高分类的准确性.

关 键 词:学习型矢量量化(LVQ)  生长型神经气(GNG)  学习误差  类间距离  学习概率  增量  学习型  矢量量化方法  文本分类算法  Text  Classification  Quantification  Vector  Learning  Growing  测试时间  结果  实验  能力  样本集  代表  选择性  训练样本  结合  神经气  生长型
修稿时间:2007-01-18

Improved Growing Learning Vector Quantification for Text Classification
WANG Xiu-Jun,SHEN Hong.Improved Growing Learning Vector Quantification for Text Classification[J].Chinese Journal of Computers,2007,30(8):1277-1285.
Authors:WANG Xiu-Jun  SHEN Hong
Affiliation:Department of Computer Science and Technology, University of Science and Technology of China, Hefei 230039
Abstract:As a simple classification method KNN has been widely applied in text classification. There are two problems in KNN-based text classification: the large computation load and the deterioration of classification accuracy caused by the non-uniform distribution of training samples. To solve these problems, based on minimizing the increment of learning errors and combining LVQ and GNG, the authors propose a new growing LVQ method and apply it to text classification. The method can generate an effective representative sample set after one phase of selective training of the training sample set, and hence has a strong learning ability. Experimental results show that this method can not only reduce the testing time of KNN, but also maintain or even improve the accuracy of classification.
Keywords:learning vector quantification  growing neural gas  learning error  inter-class distance  learning probability
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号