文档聚类中k-means算法的一种改进算法 An Improved k-means Algorithm for Documents Clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

文档聚类中k-means算法的一种改进算法

引用本文：	万小军,杨建武,陈晓鸥. 文档聚类中k-means算法的一种改进算法[J]. 计算机工程, 2003, 29(2): 102-103,157

作者姓名：	万小军杨建武陈晓鸥

作者单位：	北京大学计算机研究所文字信息处理技术国家重点实验室,北京,100871;北京大学计算机研究所文字信息处理技术国家重点实验室,北京,100871;北京大学计算机研究所文字信息处理技术国家重点实验室,北京,100871

摘要：	介绍了文档聚类中基于划分的k-means算法,k-means算法适合于海量文档集的处理,但它对孤立点很敏感,为此,文章提出将聚类均值点与聚类种子相分离的思想,并具体给出了基于该思想的对k-means算法的改进算法,实验表明,该改进算法比原k-means算法具有更高的准确性和稳定性。
关键词：	文档聚类 k-means算法划分聚类算法
文章编号：	1000-3428(2003)02-0102-02
An Improved k-means Algorithm for Documents Clustering

WAN Xiaojun,YANG Jianwu,CHEN Xiaoou. An Improved k-means Algorithm for Documents Clustering[J]. Computer Engineering, 2003, 29(2): 102-103,157

Authors:	WAN Xiaojun YANG Jianwu CHEN Xiaoou

Abstract:	This paper first introduces the partitioning-based k-means algorithms for documents clustering. The k-means algorithm adapts to processing the vast amount of documents, but it is sensitive to outliers. So this paper puts forward an idea to separate the clustering centroid from the clustering seed and brings forward an algorithm based on this idea to improve the k-means algorithm. The paper shows the results of the experiments to prove that this algorithm is more veracious and stable than the k-means algorighm.

Keywords:	Document clustering k-means algorithm Partition-based clustering algorithm
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏