一种基于中心文档的KNN中文文本分类算法 K-nearest neighbor Chinese text categorization algorithm based on center documents期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于中心文档的KNN中文文本分类算法

引用本文：	鲁婷,王浩,姚宏亮.一种基于中心文档的KNN中文文本分类算法[J].计算机工程与应用,2011,47(2):127-130.

作者姓名：	鲁婷王浩姚宏亮

作者单位：	合肥工业大学,计算机与信息学院,合肥,230009

基金项目：	国家自然科学基金，安徽省自然科学基金，合肥工业大学科学研究发展基金项目

摘要：	在浩瀚的数据资源中,为了实现对特定主题的搜索或提取,文本自动分类技术已经成为目前研究的热点。KNN是一种重要的文本自动分类方法,KNN能够处理大规模数据,且具有较高的稳定性,但面临分类速度较慢的问题。以KNN方法为基础,引入特征项间的语义关系,并根据语义关系进行聚类生成中心文档,减少了KNN要搜索的文档数,提高了分类速度。仿真实验表明,该算法在不损失分类精度的情况下,显著提高了分类的速度。
关键词：	中文文本分类 k最邻近中心文档语义相似度聚类
收稿时间：	2009-4-27
修稿时间：	2009-6-19
K-nearest neighbor Chinese text categorization algorithm based on center documents

LU Ting,WANG Hao,YAO Hongliang.K-nearest neighbor Chinese text categorization algorithm based on center documents[J].Computer Engineering and Applications,2011,47(2):127-130.

Authors:	LU Ting WANG Hao YAO Hongliang

Affiliation:	Department of Computer Science and Technology,Hefei University of Technology,Hefei 230009,China

Abstract:	In order to search or extract information in a special category from large data sourcet,ext automatic categorization has become a hot subject of research.KNN is an important method of text automatic categorization,it can deal with large data sets with more stability,but it faces with the problem of slow speed.Based on KNN classification,the semantic relation of feature items is introduced,and clustering to build center documents under it.This method reduces the number of documents which KNN should search,and increases the speed of classification.Simulation results show that the proposed algorithm improves the speed in the case of traditional classification precision.

Keywords:	Chinese text classification k-Nearest Neighbor（KNN） center documents semantic similarity clustering
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏