基于密度的kNN文本分类器训练样本裁剪方法 A Density-Based Method for Reducing the Amount of Training Data in kNN Text Classification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于密度的kNN文本分类器训练样本裁剪方法

引用本文：	李荣陆,胡运发.基于密度的kNN文本分类器训练样本裁剪方法[J].计算机研究与发展,2004,41(4):539-545.

作者姓名：	李荣陆胡运发

作者单位：	复旦大学计算机与信息技术系,上海,200433

基金项目：	国家自然科学基金项目 (60 173 0 2 7)

摘要：	随着WWW的迅猛发展，文本分类成为处理和组织大量文档数据的关键技术。kNN方法作为一种简单、有效、非参数的分类方法，在文本分类中得到广泛的应用。但是这种方法计算量大，而且训练样本的分布不均匀会造成分类准确率的下降。针对kNN方法存在的这两个问题，提出了一种基于密度的kNN分类器训练样本裁剪方法，这种方法不仅降低了kNN方法的计算量，而且使训练样本的分布密度趋于均匀，减少了边界点处测试样本的误判。实验结果显示，这种方法具有很好的性能。
关键词：	文本分类 kNN 快速分类
A Density-Based Method for Reducing the Amount of Training Data in kNN Text Classification

LI Rong,Lu and HU Yun,Fa.A Density-Based Method for Reducing the Amount of Training Data in kNN Text Classification[J].Journal of Computer Research and Development,2004,41(4):539-545.

Authors:	LI Rong Lu and HU Yun Fa

Abstract:	With the rapid development of World Wide Web, text classification has become the key technology in organizing and processing large amount of document data As a simple, effective and nonparametric classification method, k NN method is widely used in document classification But k NN classifier not only has large computational demands, but also may decrease the precision of classification because of the uneven density of training data In this paper, a density based method for reducing the amount of training data is presented, which solves two problems mentioned above It not only reduces the computational demands of k NN classifier, but also makes the density of training data even and decreases the wrong classification between the edge of classes The experiment also shows that it has good performance

Keywords:	text classification k nearest neighbor fast classification
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏