首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于密度的空间数据流在线聚类算法
引用本文:于彦伟,王沁,邝俊,何杰.一种基于密度的空间数据流在线聚类算法[J].自动化学报,2012,38(6):1051-1059.
作者姓名:于彦伟  王沁  邝俊  何杰
作者单位:1.北京科技大学计算机与通信工程学院 北京 100083
基金项目:国家高技术研究发展计划(863计划)(2011AA040101);国家自然科学基金(61172049,61003251);教育部博士点基金(20100006110015)资助~~
摘    要:为了解决空间数据流中任意形状簇的聚类问题,提出了一种基于密度的空间数据流在线聚类算法(On-line density-based clustering algorithm for spatial datastream,OLDStream),该算法在先前聚类结果上聚类增量空间数据,仅对新增空间点及其满足核心点条件的邻域数据做局部聚类更新,降低聚类更新的时间复杂度,实现对空间数据流的在线聚类.OLDStream算法具有快速处理大规模空间数据流、实时获取全局任意形状的聚类簇结果、对数据流的输入顺序不敏感、并能发现孤立点数据等优势.在真实数据和合成数据上的综合实验验证了算法的聚类效果、高效率性和较高的可伸缩性,同时实验结果的统计分析显示仅有4%的空间点消耗最坏运行时间,对每个空间点的平均聚类时间约为0.033 ms.

关 键 词:空间数据挖掘    聚类数据流    基于密度的聚类    在线算法    噪声处理
收稿时间:2011-10-13
修稿时间:2012-3-1

An On-line Density-based Clustering Algorithm for Spatial Data Stream
YU Yan-Wei,WANG Qin,KUANG Jun,HE Jie.An On-line Density-based Clustering Algorithm for Spatial Data Stream[J].Acta Automatica Sinica,2012,38(6):1051-1059.
Authors:YU Yan-Wei  WANG Qin  KUANG Jun  HE Jie
Affiliation:1.School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083
Abstract:We propose an efficient online density-based clustering algorithm (On-line density-based clustering algorithm for spatial data stream, OLDStream), which is designed for online discovering clusters in spatial data stream. In OLDStream, only the new spatial point and its adjunct points which satisfy core point are processed in clustering update. And the overall clusters results can be accessed instantaneously. The developed algorithm has exhibited many advantages such as its high scalability to online process incremental large-scale spatial data, its capability to discover overall clusters with arbitrary shape instantaneously, its insensitivity to the input sequence of data stream, and its capability to detect all isolated points. An experimental evaluation of the effectiveness, efficiency and scalability of our algorithm was performed by using real data and large synthetic data from Matlab and Thomas Brinkhoff's network-based generator. Experimental results vividly demonstrated that our algorithm can fast and efficiently cluster new points based on the previous points. The statistics of the results showed that only 4% of the points take the worst case running time, and the average running time is about 0.033 ms for each point process.
Keywords:Spatial data mining  clustering data stream  density-based clustering  on-line algorithm  handling noise
本文献已被 CNKI 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号