首页 | 本学科首页   官方微博 | 高级检索  
     


MRA++: Scheduling and data placement on MapReduce for heterogeneous environments
Affiliation:1. Federal University of Rio Grande do Sul (UFRGS), Institute of Informatics - PPGC, Caixa Postal 15.064, 91.501-970, Porto Alegre - RS, Brazil;2. Universit Pierre et Marie Curie, CNRS INRIA - REGAL, 4 Place Jussieu 75005, Paris, France;1. System and Network Engineering Research Group, Informatics Institute, University of Amsterdam, Netherlands;2. Department of Physics of Complex Systems, Eötvös Loránd University, Hungary;3. Department of Information Systems, Eötvös Loránd University, Hungary;4. Network Management and Optimal Design Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Greece;1. System and Network Engineering research group, University of Amsterdam, The Netherlands;2. NETwork Management and Optimal DEsign Laboratory, National Technical University of Athens, Greece;1. Instituto de Telecomunicações - Aveiro, Portugal;2. Universidade de Aveiro, Portugal;1. School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Private Bag 92006, Auckland, New Zealand;2. Department of Computer Science, The University of Auckland, Private Bag 92019, Auckland, New Zealand;3. Department of Computer Science, University of Bath, United Kingdom
Abstract:MapReduce has emerged as a popular programming model in the field of data-intensive computing. This is due to its simplistic design, which provides ease of use for programmers, and its framework implementations such as Hadoop, which have been adopted by large business and technology companies. In this paper we make some improvements to the Hadoop MapReduce framework by introducing algorithms that are suitable for heterogeneous environments. The goal is to efficiently perform data-intensive computing in heterogeneous environments. The need for these adaptations derives from the fact that, following the framework design proposed by Google, Hadoop is optimized to run in large homogeneous clusters. Hence we propose MRA++, a new MapReduce framework design that considers the heterogeneity of nodes during data distribution, task scheduling and job control. MRA++establishes a training task to gather information prior to the data distribution. However, we show that the delay introduced in the setup phase is offset by the effectiveness of the mechanisms and algorithms, that achieve performance gains of more than 70% in 10 Mbps networks.
Keywords:MapReduce  Distributed systems  Data placement  Data-intensive computing  Scheduling
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号