首页 | 本学科首页   官方微博 | 高级检索  
     

高通信效率的分布式流数据聚类算法
引用本文:朱强,孙玉强. 高通信效率的分布式流数据聚类算法[J]. 计算机应用, 2014, 34(9): 2505-2509. DOI: 10.11772/j.issn.1001-9081.2014.09.2505
作者姓名:朱强  孙玉强
作者单位:1. 浙江传媒学院 教育技术中心,杭州 310018;2. 常州大学 数理学院,江苏 常州 213164
基金项目:浙江省自然科学基金资助项目
摘    要:传感器节点的资源是有限的,高的通信开销会消耗大量的电量。为了减小分布式流数据分类算法的通信开销,提出一种高效的分布式流数据聚类算法。该算法包含在线局部聚类和离线全局协同聚类两个阶段。在线局部聚类算法将每个流数据源进行局部聚类,并将聚类后的结果通过序列化技术发往协同节点;协同节点得到来自不同流数据源的局部聚类信息后进行全局聚类。从实验中可以看出,当不断增加窗口的大小时,算法用于数据发送的时间恒定不变,算法的聚类时间和总的时间呈线性增长,即所提出算法的执行时间不受滑动窗口宽度和聚类个数的影响;同时该算法与集中式算法的准确性接近,并且通信开销远远小于相关的分布式算法。实验结果表明,该算法具有很好的可扩展性,可应用于对大规模分布式流数据源进行聚类分析。

关 键 词:数据通信  数据挖掘  聚类算法  流数据  分布式系统
收稿时间:2014-04-01
修稿时间:2014-06-16

Distributed clustering algorithm with high communication efficiency for streaming data
ZHU Qiang,SUN Yuqiang. Distributed clustering algorithm with high communication efficiency for streaming data[J]. Journal of Computer Applications, 2014, 34(9): 2505-2509. DOI: 10.11772/j.issn.1001-9081.2014.09.2505
Authors:ZHU Qiang  SUN Yuqiang
Affiliation:1. Educational Technology Center, Zhejiang University of Media and Communications, Hangzhou Zhejiang 310018, China
2. School of Mathmatics and Physics, Changzhou University, Changzhou Jiangsu 213164, China
Abstract:The resources of sensor nodes are limited, while high communication overhead will consume much power. In order to reduce the communication overhead of distributed streaming data clustering algorithm, a new efficient algorithm with two phases, including online local clustering and offline coordinate clustering, was proposed. The online local clustering algorithm clustered data on each remote stream data source, then sent the results to the collaborative node by serialization method. The collaborative node collected and analyzed all local clusters to get the global clusters. The experimental results show that the time for sending data is constant, the time for clustering and total time linearly grow with increasing size of sliding window, which means that the execution time of the algorithm is not affected by sliding window size and cluster number. The accuracy of the proposed algorithm is close to centralized algorithm, and the communication overhead is far less than distributed algorithm. The experimental results show that the proposed algorithm has good scalability, and can be applied to the clustering analysis of distributed large-scale streaming data.
Keywords:data communication  data mining  clustering algorithm  streaming data  distributed system
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号