首页 | 本学科首页   官方微博 | 高级检索  
     

基于Storm的海量数据实时聚类
引用本文:王铭坤,袁少光,朱永利,王德文.基于Storm的海量数据实时聚类[J].计算机应用,2014,34(11):3078-3081.
作者姓名:王铭坤  袁少光  朱永利  王德文
作者单位:华北电力大学(保定) 控制与计算机工程学院,河北 保定 071003
基金项目:国家自然科学基金资助项目,山西省电力公司科技项目
摘    要:针对现有平台处理海量数据实时响应能力普遍较差的问题,引入Storm分布式实时计算平台进行大规模数据的聚类分析,设计了基于Storm框架的DBSCAN算法。该算法将整个过程分为数据接入、聚类分析、结果输出等阶段,在框架预定义的组件中分别编程实现,各组件通过数据流连通形成任务实体,提交到集群运行完成。通过对比分析和性能监测,验证了所提方案具有低延迟和高吞吐量的优势,集群运行状况良好,负载均衡。实验结果表明Storm平台处理海量数据实时性较高,能够胜任大数据背景下的数据挖掘任务。

关 键 词:Storm  海量数据  聚类  实时分析
收稿时间:2014-07-28
修稿时间:2014-08-04

Real-time clustering for massive data using Storm
WANG Mingkun , YUAN Shaoguang , ZHU Yongli , WANG Dewen.Real-time clustering for massive data using Storm[J].journal of Computer Applications,2014,34(11):3078-3081.
Authors:WANG Mingkun  YUAN Shaoguang  ZHU Yongli  WANG Dewen
Affiliation:School of Control and Computer Engineering, North China Electric Power University, Baoding Hebei 071003, China
Abstract:In order to improve the real-time response ability of massive data processing, Storm distributed real-time platform was introduced to process data mining, and the Density-Based Spatial Clustering of Application with Noise (DBSCAN) clustering algorithm based on Storm was designed to deal with massive data. The algorithm was divided into three main steps: data collection, clustering analysis and result output. All procedures were realized under the pre-defined component of Storm and submitted to the Storm cluster for execution. Through comparative analysis and performance monitoring, the system shows the advantages of low latency and high throughput capacity. It proves that Storm suits for real-time processing of massive data.
Keywords:Storm  massive data  clustering  real-time analysis
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号