首页 | 本学科首页   官方微博 | 高级检索  
     


Scheduling algorithm based on prefetching in MapReduce clusters
Affiliation:1. College of Information Technology, Jiangxi University of Finance and Economics, Nanchang 330013, China;2. School of Statistics, Jiangxi University of Finance and Economics, Nanchang 330013, China;3. Research Center of Applied Statistics, Jiangxi University of Finance and Economics, Nanchang 330013, China;1. Department of Applied Mathematics and Computer Science, Ghent University, Belgium;2. Affectv Limited, London, United Kingdom;3. Department of Computer Science and AI, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, Spain;4. Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia;1. Key Laboratory of Modern Teaching Technology, Ministry of Education (Shaanxi Normal University), Xi’an 710062, China;2. School of Computer Science, Shaanxi Normal University, Xi’an 710062, China;3. School of Automation, Northwestern Polytechnical University, Xi’an 710072, China;4. Department of Vehicle Engineering, Xi’an Aeronautical, Xi’an 710077, China;1. Department of Electrical Engineering, COMSATS Institute of Information Technology, Attock, Pakistan;2. Hamdard Institute of Information Technology, Hamdard University, Islamabad, Pakistan;3. Department of Electronic Engineering, International Islamic University, Islamabad, Pakistan;4. Department of Mathematics, Imam Khomeini International University, Qazvin, 34149-16818, Iran;1. CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application Systems, University of Science and Technology of China, Hefei, China;2. USTC-Birmingham Joint Research Institute in Intelligent Computation and Its Applications (UBRI), University of Science and Technology of China, Hefei, China
Abstract:Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data is an effective way to improve data locality. However, it is still posing serious challenges to cluster designers on what and when to prefetch. To effectively use prefetching, we have built HPSO (High Performance Scheduling Optimizer), a prefetching service based task scheduler to improve data locality for MapReduce jobs. The basic idea is to predict the most appropriate nodes for future map tasks based on current pending tasks and then preload the needed data to memory without any delaying on launching new tasks. To this end, we have implemented HPSO in Hadoop-1.1.2. The experiment results have shown that the method can reduce the map tasks causing remote data delay, and improves the performance of Hadoop clusters.
Keywords:Data locality  MapReduce  Prefetching  Task scheduler  Memory  Big data
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号