首页 | 本学科首页   官方微博 | 高级检索  
     

Storm流处理平台中负载均衡机制的实现
引用本文:张楠,柴小丽,谢彬,唐鹏.Storm流处理平台中负载均衡机制的实现[J].计算机与现代化,2017,0(12):65.
作者姓名:张楠  柴小丽  谢彬  唐鹏
基金项目:中国电子科技集团公司第三十二研究所自立项目(ZQ160006;ZQ160007)
摘    要:Storm流处理平台解决了传统的基于Hadoop的批处理系统实时性不高的问题,为多源异构大数据处理提供了高效、快速、实时的数据处理框架。然而Storm平台在任务分配过程中只考虑了不同节点之间可用Slot的排序,并没有充分考虑节点的实际负载情况,从而容易产生负载不均衡的问题。针对以上问题,本文在Storm分布式流处理系统上实现对可用Slot和节点负载情况的加权排序改进Storm调度算法,通过数据结构设计,保证rowkey的随机性和唯一性,确保RegionServer的负载平衡;同时通过批量写入的机制,提高Hbase数写入速度,从而提高流数据存储效率。通过与原生Storm系统的对比实验,表明本文算法的改进和机制优化保证了数据的快速写入,提高了集群资源的利用率,改进后的系统在实用性与效率上具有明显的优势。

关 键 词:Storm    流处理    分布式计算    批量处理    负载均衡  
收稿时间:2017-12-26

Realization of Load Balancing Mechanism in Storm Streaming Processing Platform
ZHANG Nan,CHAI Xiao-li,XIE Bin,TANG Peng.Realization of Load Balancing Mechanism in Storm Streaming Processing Platform[J].Computer and Modernization,2017,0(12):65.
Authors:ZHANG Nan  CHAI Xiao-li  XIE Bin  TANG Peng
Abstract:Compared with Hadoop, Storm has advantage of real-time data stream processing, which provides an efficient, fast and real-time data processing framework for multi-source heterogeneous data processing. However, the worker assignments in the Storm cluster only consider the sort of available Slot between different nodes, while ignoring the current load condition of different nodes, which may fail to meet the command of load balancing when more than one topology running in the cluster. In order to improve the efficiency and achieve load balancing of real-time stream processing, a Storm scheduling algorithm is proposed which is weighted sorting of available Slot and node load conditions and based on Storm-based distributed flow processing system to reduce load imbalance. And through designing the data structure reasonably, the paper designs the rowkey in Hbase randomly and evenly, which can ensure the load balance of the various RegionServer,improve the utilization of cluster resources and increase the speed of data writing greatly. Through the comparison experiment with the original Storm system, it is shown that the above algorithm improvement and mechanism optimization ensure the fast writing of data and improve the utilization rate of cluster resources. The improved system has obvious advantages in practicality and efficiency. 
Keywords:Storm  streaming processing  distributed computing  batch processing  load balancing  
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号