首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于动态拓扑的流计算性能优化方法及其在Storm中的实现
引用本文:陆佳炜,吴涵,陈烘,张元鸣,梁倩卉,肖刚.一种基于动态拓扑的流计算性能优化方法及其在Storm中的实现[J].电子学报,2020,48(5):878-890.
作者姓名:陆佳炜  吴涵  陈烘  张元鸣  梁倩卉  肖刚
作者单位:1. 浙江工业大学计算机科学与技术学院, 浙江杭州 310023; 2. 阿里巴巴基础架构事业部大数据计算与服务团队, 浙江杭州 310011; 3. 南洋理工大学计算机科学与工程学院, 新加坡 637457
摘    要:响应性和稳定性一直是流式计算中两个至关重要的问题,而流计算系统在过载时常常表现出数据计算延迟增加和拓扑不稳定的现象,无法适应数据负载的动态变化.针对这一问题本文研究提出了一种基于动态拓扑的流计算性能优化方法,主要包括:(1)动态逐级反压:拓扑中的任务可以根据当前自身负载情况,动态调整上游向其发送数据的速率.(2)无状态拓扑数据重放:拓扑不维持数据的计算状态,尽可能地实现数据容错.(3)自适应拓扑替换:在拓扑不暂停的情况下对任务并发度进行自发调整.(4)延迟持久化队列:拓扑中对磁盘的IO读写被延迟到数据处理之外,减缓IO高频阻塞对流计算系统的影响.本文在Apache Storm中实现了以上四种方案,性能测试结果表明优化后的流计算系统与Storm默认实现相比,不仅增强了大数据动态匹配能力,而且在最优情况下改善了17%的吞吐量,并提升了约20%的数据处理速度.

关 键 词:数据流拓扑  流计算  大数据  流计算系统  性能优化  
收稿时间:2019-07-02

A Performance Optimization Method Based on Dynamic Topology for Stream Computing and Its Implementation in Storm
LU Jia-wei,WU Han,CHEN Hong,ZHANG Yuan-ming,LIANG Qian-hui,XIAO Gang.A Performance Optimization Method Based on Dynamic Topology for Stream Computing and Its Implementation in Storm[J].Acta Electronica Sinica,2020,48(5):878-890.
Authors:LU Jia-wei  WU Han  CHEN Hong  ZHANG Yuan-ming  LIANG Qian-hui  XIAO Gang
Affiliation:1. Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, Zhejiang 310023, China; 2. Team of Big Data Computing and Service, Department of Infrastructure Business, Alibaba, Hangzhou, Zhejiang 310011, China; 3. School of Computer Science and Engineering, Nanyang Technological University, Singapore 637457, Singapore
Abstract:Responsiveness and stability have always been two important problems in stream computing.However,as the scale of data being processed in real-time has increased,along with an increase in the data processing latency and topology instability of stream computing,many limitations of stream processing system have become apparent.Aiming at these problems,we present a performance optimization method based on dynamic topology for stream computing:(1) Dynamic step-by-step backpressure:the task in the topology can dynamically adjust the rate of upstream data transmission according to the current load.(2) Stateless topology data replay:topology can achieve data fault tolerance autonomously without maintaining the calculation of data state.(3) Adaptive topology replacement:no need for topology to suspend,the system can adjust the task concurrency spontaneously.(4) Delayed persistent queue:it delays the IO reading and writing in the disk out of the data processing,which mitigates the impact of IO high-frequency blocking in stream computing system.In this paper,the four methods are implemented in Apache Storm.The experimental results show that the optimized system not only enhances the dynamic matching capability of big data,but also achieves 17% higher throughput and 20% better data processing speed in the best case.
Keywords:data stream topology  stream computing  big data  stream computing system  performance optimization  
点击此处可从《电子学报》浏览原始摘要信息
点击此处可从《电子学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号