首页 | 本学科首页   官方微博 | 高级检索  
     


Approximation algorithms and heuristics for task scheduling in data‐intensive distributed systems
Authors:Marcelo G. Póvoa  Eduardo C. Xavier
Affiliation:1. Google, Belo Hozitonte, Brazil;2. Institute of Computing, University of Campinas, Campinas, Brazil
Abstract:In this work, we are interested in the problem of task scheduling on large‐scale data‐intensive computing systems. In order to achieve good performance, one must construct not only good task schedules but also good data allocation across nodes on the system, since before a task can be executed, it must have access to data distributed on the system. In this article, we present a general formulation of a static problem that combines both scheduling and replication problems in data‐intensive distributed systems. We show that this problem does not admit an approximation algorithm. However, considering a restricted version of the problem that considers some practical constraints, an approximation algorithm can be designed. From a practical perspective, we introduce a novel heuristic for the problem that is based on nodes clustering. We compare the heuristic with two adapted approaches from other works in the literature by computational simulations using an extensive set of instances based on real computer grids. We show that our heuristic often obtains the best solutions and also runs faster than other approaches.
Keywords:data grids  task scheduling  data replication  approximation algorithms
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号