基于改进的kNN算法的中文网页自动分类方法研究 Research of Chinese Web classification method based on improved kNN algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于改进的kNN算法的中文网页自动分类方法研究

引用本文：	胡燕,吴虎子,钟珞.基于改进的kNN算法的中文网页自动分类方法研究[J].武汉大学学报(工学版),2007,40(4):141-144.

作者姓名：	胡燕吴虎子钟珞

作者单位：	武汉理工大学计算机科学与技术学院,湖北,武汉,430070

摘要：	概述了中文网页分类的一般过程,重点论述了在分类过程中特征词提取、训练库建立和文本分类算法等关键问题,针对向量空间模型的文本特征表示方法中特征词数量的多少与分类算法的效率有着密切关系的特点,提出了基于词性的特征词提取方法,并且在文本相似度计算时,融入传统的特征向量的比较方法来对kNN算法进行改进,提出了基于特征词减少的改进kNN算法,提高了分类算法的效率和性能.
关键词：	特征词训练库文本相似度 kNN算法
文章编号：	1671-8844（2007）04-0141-04
修稿时间：	2007-03-25
Research of Chinese Web classification method based on improved kNN algorithm

HU Yan,WU Huzi,ZHONG Luo.Research of Chinese Web classification method based on improved kNN algorithm[J].Engineering Journal of Wuhan University,2007,40(4):141-144.

Authors:	HU Yan WU Huzi ZHONG Luo

Affiliation:	School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China

Abstract:	The procedure of Chinese Web classification is described;and the keys of this classification including feature selection,building the training collection and text categorization algorithm are discussed crucially.The quantity of characteristic word in the text characteristic expression method of vector space model has an intimate relationship with the efficiency of classification algorithm.A characteristic word extraction method has been developed based on word gender.By fusing the traditional method which comparing the feature vectors when computing the similarity of texts to reform the k-nearest neighbor(kNN) algorithm,a modified kNN algorithm,which is based on lessening of characteristic words and data division respectively,has been proposed;so that the efficiency and performance of classification algorithm are improved.

Keywords:	characteristic words training collection similarity of the text kNN algorithm
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏