首页 | 本学科首页   官方微博 | 高级检索  
     

一种周期性MapReduce作业的负载均衡策略
引用本文:傅 杰,都志辉.一种周期性MapReduce作业的负载均衡策略[J].计算机科学,2013,40(3):38-40.
作者姓名:傅 杰  都志辉
作者单位:(清华大学计算机科学与技术系 北京100084)
摘    要:MapReduce任务负载均衡主要是通过分区函数来实现的,Hadoop默认的分区函数并不能很好地保证reducer的负载均衡。针对周期性的业务处理提出了一种基于权重计算的负载均衡策略,周期性任务的数据分布与历史数据相比具有相似性。本策略根据历史数据运行的信息运算出数据权重信息(文中用权重表示每条记录的处理复杂 度),再通过Map阶段抽样分析当前这批数据的分布特征来预测待处理数据带权重的整体近似分布情况,从而指导Reduce分区,以保证其负载均衡。通过简单的例子仿真了整个策略的运作过程,并且对比了与TeraSor、思路的不同点。最后通过分析用户访问视频的日志证明了文中提到的策略比默认的策略性能提高了接近1倍。

关 键 词:MapReduce    TeraSort,负载均衡,周期性

Load Balancing Strategy on Periodical MapReduce Job
Abstract:The MapReduce task load balancing in Hadoop mainly depends on the partition function. The Hadoop default partition function is not efficient in practical business processing. This paper presented a load balancing strategy based on the weight value of the periodic jobs. Because the data's distribution is similar in each period, we calculated the weight from historical data's profile. Through analyzing a sample data in Map phase to predict the whole data weighted integral approximate distribution, the strategy guids the Reduce partition to ensure its load balancing. We also presented the difference between TeraSort strategy and the new strategy. The experimental results with the view video logs show that the performance of our strategy is improved about 2 times compared with the default strategy.
Keywords:MapReduce  TeraSort  Load balance  Periodic
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号