Optimizing and Tuning MapReduce Jobs to Improve the Large‐Scale Data Analysis Process期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Optimizing and Tuning MapReduce Jobs to Improve the Large‐Scale Data Analysis Process

Authors:	Wichian Premchaiswadi Walisa Romsaiyud

Affiliation:	Graduate School of Information Technology, Siam University, , Bangkok, 10160 Thailand

Abstract:	Data‐intensive applications process large volumes of data using a parallel processing method. MapReduce is a programming model designed for data‐intensive applications for massive data sets and an execution framework for large‐scale data processing on clusters of commodity servers. While fault tolerance, easy programming structure, and high scalability are considered strong points of MapReduce; however its configuration parameters must be fine‐tuned to the specific deployment, which makes it more complex in configuration and performance. This paper explains tuning of the Hadoop configuration parameters, which directly affect MapReduce's job workflow performance under various conditions to achieve maximum performance. On the basis of the empirical data we collected, it became apparent that three main methodologies can affect the execution time of MapReduce running on cluster systems. Therefore, in this paper, we present a model that consists of three main modules: (1) Extending a data redistribution technique in order to find the high‐performance nodes, (2) Utilizing the number of map/reduce slots in order to make it more efficient in terms of execution time, and (3) Developing a new hybrid routing schedule shuffle phase in order to define the scheduler task while memory management level is reduced.

Keywords:

设为首页 | 免责声明 | 关于勤云 | 加入收藏