首页 | 本学科首页   官方微博 | 高级检索  
     

高维数据流聚类及其演化分析研究
引用本文:周晓云,孙志挥,张柏礼,杨宜东.高维数据流聚类及其演化分析研究[J].计算机研究与发展,2006,43(11):2005-2011.
作者姓名:周晓云  孙志挥  张柏礼  杨宜东
作者单位:东南大学计算机科学与工程学院,南京,210096
基金项目:国家自然科学基金;高等学校博士学科点专项科研项目
摘    要:基于数据流数据的聚类分析算法已成为研究的热点.提出一种基于子空间的高维数据流聚类及演化分析算法CAStream,该算法对数据空间进行网格化,采用近似的方法记录网格单元的统计信息,并将潜在密集网格单元快照以改进的金字塔时间结构进行存储,最后采用深度优先搜索方法进行聚类及其演化分析.CAStream能够有效处理高雏数据流,并能发现任意形状分布的聚类.基于真实数据集与仿真数据集的实验表明,算法具有良好的适用性和有效性.

关 键 词:数据流  聚类分析  改进金字塔时间结构  演化分析
收稿时间:09 4 2005 12:00AM
修稿时间:2005-09-042006-04-25

Research on Clustering and Evolution Analysis of High Dimensional Data Stream
Zhou Xiaoyun,Sun Zhihui,Zhang Baili,Yang Yidong.Research on Clustering and Evolution Analysis of High Dimensional Data Stream[J].Journal of Computer Research and Development,2006,43(11):2005-2011.
Authors:Zhou Xiaoyun  Sun Zhihui  Zhang Baili  Yang Yidong
Affiliation:School of Computer Science and Engineering, Southeast University, Nanjing 210096
Abstract:Clustering analysis in data stream has become a hot research issue. In this paper, CAStream, a novel algorithm of clustering and evolution analysis over high dimensional data stream is presented, which is based on subspace. CAStream partitions the data space into grids, gets the grid summary statistics using approximate method, then stores snapshots of potential dense girds by improved pyramid time frame, and finally finds the clusters and analyzes the cluster evolution by the depth-first search algorithm. CAStream can deal with high dimensional data stream, and discover the clusters with arbitrary shape. The experimental results on real datasets and synthetic datasets demonstrate the promising availabilities of the approach.
Keywords:data stream  clustering analysis  improved pyramid time frame  evolution analysis
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号