Profiling and evaluating hardware choices for MapReduce environments: An application-aware approach |
| |
Affiliation: | 1. Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA 94043, United States;2. Hewlett–Packard Laboratories, Palo Alto, CA 94304, United States;3. Department of Computer Science, University of Illinois at Urbana–Champaign, IL 61801, United States;1. Università Ca’ Foscari di Venezia, via Torino, 155, 30172 Mestre-Venezia, Italy;2. FBK-irst, via Sommarive, 18, I-38123 Trento, Italy;1. Linköping University, Sweden;2. University of Saskatchewan, Canada;3. University of Calgary, Canada;1. ECE Department, Wayne State University, Detroit, MI 48202, USA;2. Facebook, Menlo Park, CA 94025, USA;1. ENS Paris, 45 rue d’Ulm 75005 Paris, France;2. Inria, 23 avenue d’Italie, 75013 Paris, France;1. Department of Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands;2. Bell Laboratories, Alcatel-Lucent, P.O. Box 636, Murray Hill, NJ 07974, USA |
| |
Abstract: | The core business of many companies depends on the timely analysis of large quantities of new data. MapReduce clusters that routinely process petabytes of data represent a new entity in the evolving landscape of clouds and data centers. During the lifetime of a data center, old hardware needs to be eventually replaced by new hardware. The hardware selection process needs to be driven by performance objectives of the existing production workloads. In this work, we present a general framework, called Ariel, that automates system administrators’ efforts for evaluating different hardware choices and predicting completion times of MapReduce applications for their migration to a Hadoop cluster based on the new hardware. The proposed framework consists of two key components: (i) a set of microbenchmarks to profile the MapReduce processing pipeline on a given platform, and (ii) a regression-based model that establishes a performance relationship between the source and target platforms. Benchmarking and model derivation can be done using a small test cluster based on new hardware. However, the designed model can be used for predicting the jobs’ completion time on a large Hadoop cluster and be applied for its sizing to achieve desirable service level objectives (SLOs). We validate the effectiveness of the proposed approach using a set of twelve realistic MapReduce applications and three different hardware platforms. The evaluation study justifies our design choices and shows that the derived model accurately predicts performance of the test applications. The predicted completion times of eleven applications (out of twelve) are within 10% of the measured completion times on the target platforms. |
| |
Keywords: | MapReduce Benchmarking Performance modeling |
本文献已被 ScienceDirect 等数据库收录! |
|