首页 | 本学科首页   官方微博 | 高级检索  
     

动态滑动窗口的数据流聚类方法
引用本文:张忠平,王浩,薛伟,夏炎. 动态滑动窗口的数据流聚类方法[J]. 计算机工程与应用, 2011, 47(7): 135-138. DOI: 10.3778/j.issn.1002-8331.2011.07.039
作者姓名:张忠平  王浩  薛伟  夏炎
作者单位:燕山大学 信息科学与工程学院,河北 秦皇岛 066004
基金项目:国家自然科学基金,河北省教育厅科研计划项目
摘    要:数据流聚类是聚类分析中的重要问题。针对数据流的流速是变化的问题,在两阶段聚类框架基础上提出基于动态滑动窗口的数据流聚类算法。在线阶段,引入微聚类特征来存储数据流的概要信息,利用存储的概要信息动态调整滑动窗口规模,并计算数据点与微聚类中心的距离,以维护微聚类特征;离线阶段,对在线聚类阶段的聚类结果采用K-means算法进行宏聚类,生成最终聚类。实验结果表明,该算法具有较高的聚类质量和较好的伸缩性。

关 键 词:数据挖掘  数据流  聚类  滑动窗口  
修稿时间: 

Approach for data streams clustering over dynamic sliding windows
ZHANG Zhongping,WANG Hao,XUE Wei,XIA Yan. Approach for data streams clustering over dynamic sliding windows[J]. Computer Engineering and Applications, 2011, 47(7): 135-138. DOI: 10.3778/j.issn.1002-8331.2011.07.039
Authors:ZHANG Zhongping  WANG Hao  XUE Wei  XIA Yan
Affiliation:College of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066004,China
Abstract:The clustering of data streams is an important problem for clustering analysis.In order to address the data streams with varying speed,an efficient data streams clustering algorithm over dynamic sliding windows is proposed,which based on the two-phased framework.In the online component,the novel micro-cluster feature is introduced to store the important statistical information of data streams.Through computing the distances from data points to the center of each micro-cluster,and adjusting the sizes of sliding windows,the corresponding clustering features are maintained dynamically.In the offline component,by employing the mean values of the micro-clusters in online component,k-means algorithm is adopted to generate the final clustering results.Experimental results show that this approach has higher clustering purity and better scalability.
Keywords:data mining  data streams  clustering  sliding windows
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号