首页 | 本学科首页   官方微博 | 高级检索  
     

高维数据流子空间聚类发现及维护算法
引用本文:周晓云,孙志挥,张柏礼,杨宜东.高维数据流子空间聚类发现及维护算法[J].计算机研究与发展,2006,43(5):834-840.
作者姓名:周晓云  孙志挥  张柏礼  杨宜东
作者单位:东南大学计算机科学与工程系,南京,210096
基金项目:中国科学院资助项目;教育部高等学校博士学科点科研项目
摘    要:近年来由于数据流应用的大量涌现,基于数据流模型的数据挖掘算法研究已成为重要的应用前沿课题.提出一种基于Hoeffding界的高维数据流的子空间聚类发现及维护算法--SHStream.算法将数据流分段(分段长度由Hoeffding界确定),在数据分段上进行子空间聚类,通过迭代逐步得到满足聚类精度要求的聚类结果,同时针对数据流的动态性,算法对聚类结果进行调整和维护.算法可以有效地处理高雏数据流和对任意形状分布数据的聚类问题.基于真实数据集与仿真数据集的实验表明,算法具有良好的适用性和有效性.

关 键 词:数据流  聚类算法  子空间聚类  Hoeffding界
收稿时间:07 4 2005 12:00AM
修稿时间:2005-07-042005-11-15

An Efficient Discovering and Maintenance Algorithm of Subspace Clustering over High Dimensional Data Streams
Zhou Xiaoyun,Sun Zhihui,Zhang Baili,Yang Yidong.An Efficient Discovering and Maintenance Algorithm of Subspace Clustering over High Dimensional Data Streams[J].Journal of Computer Research and Development,2006,43(5):834-840.
Authors:Zhou Xiaoyun  Sun Zhihui  Zhang Baili  Yang Yidong
Affiliation:Department of Computer Science and Engineering, Southeast University, Nanjing 210096
Abstract:Data mining based on data stream has become a very hot research field in recent years. In this paper a novel discovering and maintenance algorithm of subspace clustering over high dimensional data streams is presented, which is based on Hoeffding bound and named SHStream. SHStream partitions data streams (the length of each segment is computed by Hoeffding bound), makes subspace clusters on the segments and discovers clusters step-by-step. Meanwhile, focusing on dynamic of data stream, SHStream adjusts and maintains the cluster results. SHStream can deal with high dimensional clustering problem effectively and discover clusters with arbitrary shape through the technology based on grids and density. The experimental results on real datasets and synthetic datasets demonstrate promising availabilities of the approach.
Keywords:data stream  clustering algorithm  subspace clustering  Hoeffding bound
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号