首页 | 本学科首页   官方微博 | 高级检索  
     

高维数据流的自适应子空间聚类算法
引用本文:任家东,周玮玮,何海涛.高维数据流的自适应子空间聚类算法[J].计算机科学与探索,2010,4(9):859-864.
作者姓名:任家东  周玮玮  何海涛
作者单位:1. 燕山大学,信息科学与工程学院,河北,秦皇岛,066004;北京理工大学,计算机科学技术学院,北京,100081
2. 燕山大学,信息科学与工程学院,河北,秦皇岛,066004
摘    要:高维数据流聚类是数据挖掘领域中的研究热点。由于数据流具有数据量大、快速变化、高维性等特点,许多聚类算法不能取得较好的聚类质量。提出了高维数据流的自适应子空间聚类算法SAStream。该算法改进了HPStream中的微簇结构并定义了候选簇,只在相应的子空间内计算新来数据点到候选簇质心的距离,减少了聚类时被检查微簇的数目,将形成的微簇存储在金字塔时间框架中,使用时间衰减函数删除过期的微簇;当数据流量大时,根据监测的系统资源使用情况自动调整界限半径和簇选择因子,从而调节聚类的粒度。实验结果表明,该算法具有良好的聚类质量和快速的数据处理能力。

关 键 词:高维数据流  子空间聚类  数据流流量  自适应
修稿时间: 

Adaptive Clustering Algorithm for Mining Subspace Clusters in High-Dimensional Data Stream
REN Jiadong,ZHOU Weiwei,HE Haitao.Adaptive Clustering Algorithm for Mining Subspace Clusters in High-Dimensional Data Stream[J].Journal of Frontier of Computer Science and Technology,2010,4(9):859-864.
Authors:REN Jiadong  ZHOU Weiwei  HE Haitao
Affiliation:1. College of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China 2. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
Abstract:Clustering high-dimensional data streams is a research focused on the area of data mining. As the data stream is large volume, rapidly, high-dimensional, many clustering algorithms cannot achieve good clustering quali¬ty. This paper proposes a new adaptive clustering algorithm for mining subspace clusters in high-dimensional data stream, called SAStream. It improves the cluster structure in HPStream and defines the candidate clusters. The algorithm only computes the distance between the newly coming data points and the centroids of the candidate clusters instead of all clusters, so the number of examined clusters is reduced during clustering process. The created clusters are stored in pyramidal time frame and time fading function is used to discount the history of past behavior. When the data rate is fast, the LimitingRadius and cluster selection factor adjust automatically, and the clustering granularity adjusts all along. The experimental results show that the algorithm can group well with high speed.
Keywords:high-dimensional data stream  subspace clustering  data rate  adaptive
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机科学与探索》浏览原始摘要信息
点击此处可从《计算机科学与探索》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号