一种大规模的递增聚类算法及其在文档聚类中的应用 An Incremental Clustering Algorithm for Large-Scale Data and Its Application on Document Clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种大规模的递增聚类算法及其在文档聚类中的应用

引用本文：	唐春生,金以慧.一种大规模的递增聚类算法及其在文档聚类中的应用[J].计算机工程与应用,2002,38(11):187-190,195.

作者姓名：	唐春生金以慧

作者单位：	清华大学自动化系,北京,100084

摘要：	聚类是将数据进行划分,并从中发现有用信息的一种有效手段,它在很多领域都有着非常重要的应用。K均值方法是聚类方法中较常用的一种,但对于大规模的数据,而且有计算资源和时间约束的情况下,K均值方法已不能满足要求。该文提出的CFK-means方法是一种适合于大规模数据的、快速高效的递增聚类方法,它采用了聚类特性(Clus-teringFeatures,CF)结构来表示聚类,能更有效地保留和利用聚类信息。它只需扫描数据一次即可得到聚类划分,所需的计算时间和文件交换时间数倍少于K均值方法,而且聚类的准确度和K均值方法相当。通过对仿真数据和实际文本集数据进行的对比实验证明了CFK-means方法的有效性。
关键词：	聚类特性(CF) CFK-means算法 k-means算法文档聚类
文章编号：	1002-8331-(2002)11-0187-04
An Incremental Clustering Algorithm for Large-Scale Data and Its Application on Document Clustering

Tang Chunsheng Jin Yihui.An Incremental Clustering Algorithm for Large-Scale Data and Its Application on Document Clustering[J].Computer Engineering and Applications,2002,38(11):187-190,195.

Authors:	Tang Chunsheng Jin Yihui

Abstract:	Clustering is an efficient method to discovery va luable information in data and it is applied to many domains.K-means algorithm is an important clustering method,but it is difficult to use k-means to clu ster for large-scale data,especially when there is limit in computing resource and time.An incremental algorithm called CFK-means method is presented in thi s paper to solve this problem.More cluster information can be reserved and util ized by using the clustering features(CF)structure into k-means algorithm.Cl ustering results can be achieved very fast in one scan of the data.The computin g and file exchange time of CFK-means method is several times less than k-mea ns algorithm and the accuracy of the results is almost equal to k-means algorit hm.The effectiveness of this method is proved by the experiments on simulative data and real text sets.

Keywords:	Clustering Features(CF) CFK-means algori thm k-means algorithm document clustering
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏