首页 | 本学科首页   官方微博 | 高级检索  
     


iMapReduce: A Distributed Computing Framework for Iterative Computation
Authors:Yanfeng Zhang  Qixin Gao  Lixin Gao  Cuirong Wang
Affiliation:1. School of Information Science and Engineering, Northeastern University, 11 Wenhua Road, Shenyang, Liaoning, 110819, China
2. Department of Electrical and Information Engineering, Northeastern University at Qinhuangdao, 143 Taishan Road, Qinhuangdao, Hebei, 066000, China
3. Department of Electrical and Computer Engineering, University of Massachusetts Amherst, 151 Holdsworth Way, Amherst, MA, 01002, USA
Abstract:Iterative computation is pervasive in many applications such as data mining, web ranking, graph analysis, online social network analysis, and so on. These iterative applications typically involve massive data sets containing millions or billions of data records. This poses demand of distributed computing frameworks for processing massive data sets on a cluster of machines. MapReduce is an example of such a framework. However, MapReduce lacks built-in support for iterative process that requires to parse data sets iteratively. Besides specifying MapReduce jobs, users have to write a driver program that submits a series of jobs and performs convergence testing at the client. This paper presents iMapReduce, a distributed framework that supports iterative processing. iMapReduce allows users to specify the iterative computation with the separated map and reduce functions, and provides the support of automatic iterative processing within a single job. More importantly, iMapReduce significantly improves the performance of iterative implementations by (1) reducing the overhead of creating new MapReduce jobs repeatedly, (2) eliminating the shuffling of static data, and (3) allowing asynchronous execution of map tasks. We implement an iMapReduce prototype based on Apache Hadoop, and show that iMapReduce can achieve up to 5 times speedup over Hadoop for implementing iterative algorithms.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号