首页 | 本学科首页   官方微博 | 高级检索  
     

数据本地性感知的MapReduce负载均衡策略
引用本文:李航晨,秦小麟,沈尧.数据本地性感知的MapReduce负载均衡策略[J].计算机科学,2015,42(10):50-56.
作者姓名:李航晨  秦小麟  沈尧
作者单位:南京航空航天大学计算机科学与技术学院 南京210016,南京航空航天大学计算机科学与技术学院 南京210016,南京航空航天大学计算机科学与技术学院 南京210016
基金项目:本文受国家自然科学基金项目(61373015,61300052),国家教育部高等学校博士学科点专项科研基金(20103218110017),江苏高校优势学科建设工程资助
摘    要:现有针对MapReduce的负载均衡调度的研究均未考虑中间数据的分布特点及网络传输的开销,导致额外的网络传输代价与系统效率的下降。为解决上述问题,提出了一种数据本地性感知的负载均衡策略。充分利用YARN中资源管理的新特性,在Map阶段对内存数据溢写的同时进行统计以获取数据分布,根据数据分布情况及各节点的计算能力进行任务调度,减少网络传输开销的同时尽量保证各节点的负载平衡。此外,通过引入细粒度分区与分区的自适应分裂策略,进一步提高在数据倾斜时调度策略的性能。对比实验结果表明,提出的负载均衡调度策略能有效提升性能,同时较好地降低网络总开销。

关 键 词:MapReduce  数据本地性  数据倾斜  负载均衡
收稿时间:2014/10/31 0:00:00
修稿时间:2015/1/28 0:00:00

Load Balancing Strategy on MapReduce with Locality-aware
LI Hang-chen,QIN Xiao-lin and SHEN Yao.Load Balancing Strategy on MapReduce with Locality-aware[J].Computer Science,2015,42(10):50-56.
Authors:LI Hang-chen  QIN Xiao-lin and SHEN Yao
Affiliation:College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China,College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China and College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
Abstract:Intermediate data distribution characteristics and network traffic overhead are not considered in any existing research on load balancing strategy on MapReduce,resulting in additional network traffic overhead and decrease of system efficiency.To solve this problem ,this paper presented a locality-aware load balancing strategy.By taking advantage of the new features of resource management brought by YARN,the strategy can obtain the data distribution when the buffered data are written to local disk.The strategy schedules the reduce tasks according to the data distribution along with the processing speed of each node to decrease network overhead while maximizing load balancing of each node.In addition,to further improve the performance of scheduling strategy with data skew,this paper introduced the strategy of fine-grained partitioning and self-adaption fragmentation.The comparative experimental results show that the presented strategy can improve the performance effectively,and reduce the total network traffic overhead.
Keywords:MapReduce  Data locality  Data skew  Load balance
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号