首页 | 本学科首页   官方微博 | 高级检索  
     

基于密度的kNN文本分类器训练样本裁剪方法
引用本文:李荣陆,胡运发.基于密度的kNN文本分类器训练样本裁剪方法[J].计算机研究与发展,2004,41(4):539-545.
作者姓名:李荣陆  胡运发
作者单位:复旦大学计算机与信息技术系,上海,200433
基金项目:国家自然科学基金项目 (60 173 0 2 7)
摘    要:随着WWW的迅猛发展,文本分类成为处理和组织大量文档数据的关键技术。kNN方法作为一种简单、有效、非参数的分类方法,在文本分类中得到广泛的应用。但是这种方法计算量大,而且训练样本的分布不均匀会造成分类准确率的下降。针对kNN方法存在的这两个问题,提出了一种基于密度的kNN分类器训练样本裁剪方法,这种方法不仅降低了kNN方法的计算量,而且使训练样本的分布密度趋于均匀,减少了边界点处测试样本的误判。实验结果显示,这种方法具有很好的性能。

关 键 词:文本分类  kNN  快速分类

A Density-Based Method for Reducing the Amount of Training Data in kNN Text Classification
LI Rong,Lu and HU Yun,Fa.A Density-Based Method for Reducing the Amount of Training Data in kNN Text Classification[J].Journal of Computer Research and Development,2004,41(4):539-545.
Authors:LI Rong  Lu and HU Yun  Fa
Abstract:With the rapid development of World Wide Web, text classification has become the key technology in organizing and processing large amount of document data As a simple, effective and nonparametric classification method, k NN method is widely used in document classification But k NN classifier not only has large computational demands, but also may decrease the precision of classification because of the uneven density of training data In this paper, a density based method for reducing the amount of training data is presented, which solves two problems mentioned above It not only reduces the computational demands of k NN classifier, but also makes the density of training data even and decreases the wrong classification between the edge of classes The experiment also shows that it has good performance
Keywords:text classification  k  nearest neighbor  fast classification  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号