首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The purpose of this paper is to examine the impact of scheduling parallel tasks onto non-uniform memory access (NUMA) shared-memory multiprocessor systems by considering non-negligible intertask communications and communication contentions. Communication contentions arise from the communication medium having insufficient capacity to serve all transmissions, thereby causing significant contention delays. Therefore, a new scheduling algorithm, herein referred to as the Extended Critical Path (ECP) algorithm is proposed. The proposed algorithm schedules parallel tasks by considering non-negligible intertask communications and the contentions among shared communication resources. Moreover, it ensures performance within a factor of two of the optimum for general directed acyclic task graphs (DATGs). Experimental results demonstrate the superiority of the ECP algorithm over the scheduling algorithms presented in previous literature.  相似文献   

2.
The multiprocessor scheduling problem is the problem of scheduling the tasks of a precedence constrained task graph (representing a parallel program) onto the processors of a multiprocessor in a way that minimizes the completion time. Since this problem is known to be NP-hard in the strong sense in all but a few very restricted eases, heuristic algorithms are being developed which obtain near optimal schedules in a reasonable amount of computation time. We present an efficient heuristic algorithm for scheduling precedence constrained task graphs with nonnegligible intertask communication onto multiprocessors taking contention in the communication channels into consideration. Our algorithm for obtaining satisfactory suboptimal schedules is based on the classical list scheduling strategy. It simultaneously exploits the schedule-holes generated in the processors and in the communication channels during the scheduling process in order to produce better schedules. We demonstrate the effectiveness of our algorithm by comparing with two competing heuristic algorithms available in the literature  相似文献   

3.
Heterogeneous computing systems are promising computing platforms, since single parallel architecture based systems may not be sufficient to exploit the available parallelism with the running applications. In some cases, heterogeneous distributed computing (HDC) systems can achieve higher performance with lower cost than single-machine supersystems. However, in HDC systems, processors and networks are not failure free and any kind of failure may be critical to the running applications. One way of dealing with such failures is to employ a reliable scheduling algorithm. Unfortunately, most existing scheduling algorithms for precedence constrained tasks in HDC systems do not adequately consider reliability requirements of inter-dependent tasks. In this paper, we design a reliability-driven scheduling architecture that can effectively measure system reliability, based on an optimal reliability communication path search algorithm, and then we introduce reliability priority rank (RRank) to estimate the task’s priority by considering reliability overheads. Furthermore, based on directed acyclic graph (DAG) we propose a reliability-aware scheduling algorithm for precedence constrained tasks, which can achieve high quality of reliability for applications. The comparison studies, based on both randomly generated graphs and the graphs of some real applications, show that our scheduling algorithm outperforms the existing scheduling algorithms in terms of makespan, scheduling length ratio, and reliability. At the same time, the improvement gained by our algorithm increases as the data communication among tasks increases.  相似文献   

4.
This paper addresses the problem of scheduling parallel programs represented as directed acyclic task graphs for execution on distributed memory parallel architectures. Because of the high communication overhead in existing parallel machines, a crucial step in scheduling is task clustering, the process of coalescing fine grain tasks into single coarser ones so that the overall execution time is minimized. The task clustering problem is NP-hard, even when the number of processors is unbounded and task duplication is allowed. A simple greedy algorithm is presented for this problem which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Indeed, the quality of the schedule improves as the granularity of the task graph becomes larger. For example, if the granularity is at least 1/2, the makespan of the schedule is at most 5/3 times optimal. For a task graph with n tasks and e inter-task communication constraints, the algorithm runs in O(n(n lg n+e)) time, which is n times faster than the currently best known algorithm for this problem. Similar algorithms are developed that produce: (1) optimal schedules for coarse grain graphs; (2) 2-optimal schedules for trees with no task duplication; and (3) optimal schedules for coarse grain trees with no task duplication  相似文献   

5.
On cluster resource allocation for multiple parallel task graphs   总被引:1,自引:0,他引:1  
Many scientific applications can be structured as parallel task graphs (PTGs), that is, graphs of data-parallel tasks. Adding data parallelism to a task-parallel application provides opportunities for higher performance and scalability, but poses additional scheduling challenges. In this paper, we study the off-line scheduling of multiple PTGs on a single, homogeneous cluster. The objective is to optimize performance without compromising fairness among the PTGs. We consider the range of previously proposed scheduling algorithms applicable to this problem, from both the applied and the theoretical literature, and we propose minor improvements when possible. Our main contribution is an extensive evaluation of these algorithms in simulation, using both synthetic and real-world application configurations, using two different metrics for performance and one metric for fairness. We identify a handful of algorithms that provide good trade-offs when considering all these metrics. The best algorithm overall is one that structures the schedule as a sequence of phases of increasing duration based on a makespan guarantee produced by an approximation algorithm.  相似文献   

6.
Network processors are designed to handle the inherently parallel nature of network processing applications. However, partitioning and scheduling of application tasks and data allocation to reduce memory contention remain as major challenges in realizing the full performance potential of a given network processor. The large variety of processor architectures in use and the increasing complexity of network applications further aggravate the problem. This work proposes a novel framework, called FEADS, for automating the task of application partitioning and scheduling for network processors. FEADS uses the simulated annealing approach to perform design space exploration of application mapping onto processor resources. Further, it uses cyclic and r-periodic scheduling to achieve higher throughput schedules. To evaluate dynamic performance metrics such as throughput and resource utilization under realistic workloads, FEADS automatically generates a Petri net (PN) which models the application, architectural resources, mapping and the constructed schedule and their interaction. The throughput obtained by schedules constructed by FEADS is comparable to that obtained by manual scheduling for linear task flow graphs; for more complicated task graphs, FEADS’ schedules have a throughput which is upto 2.5 times higher compared to the manual schedules. Further, static scheduling of tasks results in an increase in throughput by upto 30% compared to an implementation of the same mapping without task scheduling.  相似文献   

7.
Scheduling large task graphs is an important issue in parallel computing. In this paper we tackle the two following problems: (1) how to schedule a task graph, when it is too large to fit into memory? (2) How to build a generic program such that parameter values of a task graph can be given at run-time? Our answers feature the parameterized task graph (PTG), which is a symbolic representation of the task graph. We propose a dynamic scheduling algorithm which takes a PTG as an entry and allows us to generate a generic program. We present a theoretical study which shows that our algorithm finds good schedules for coarse-grain task graphs, has a low memory cost, and a low computational complexity. When the average number of operations of each task is large enough, we prove that the scheduling overhead is negligible with respect to the makespan. We also provide experimental results that demonstrate the feasibility of our approach using several compute-intensive kernels found in numerical scientific applications.  相似文献   

8.
On parallelizing the multiprocessor scheduling problem   总被引:1,自引:0,他引:1  
Existing heuristics for scheduling a node and edge weighted directed task graph to multiple processors can produce satisfactory solutions but incur high time complexities, which tend to exacerbate in more realistic environments with relaxed assumptions. Consequently, these heuristics do not scale well and cannot handle problems of moderate sizes. A natural approach to reducing complexity, while aiming for a similar or potentially better solution, is to parallelize the scheduling algorithm. This can be done by partitioning the task graphs and concurrently generating partial schedules for the partitioned parts, which are then concatenated to obtain the final schedule. The problem, however, is nontrivial as there exists dependencies among the nodes of a task graph which must be preserved for generating a valid schedule. Moreover, the time clock for scheduling is global for all the processors (that are executing the parallel scheduling algorithm), making the inherent parallelism invisible. In this paper, we introduce a parallel algorithm that is guided by a systematic partitioning of the task graph to perform scheduling using multiple processors. The algorithm schedules both the tasks and messages, and is suitable for graphs with arbitrary computation and communication costs, and is applicable to systems with arbitrary network topologies using homogeneous or heterogeneous processors. We have implemented the algorithm on the Intel Paragon and compared it with three closely related algorithms. The experimental results indicate that our algorithm yields higher quality solutions while using an order of magnitude smaller scheduling times. The algorithm also exhibits an interesting trade-off between the solution quality and speedup while scaling well with the problem size  相似文献   

9.

SRAM-based FPGAs feature high performance and flexibility. Thus, they have found many applications in modern high-performance computing (HPC) systems. These systems suffer from the limitation of the computing resources problem for running HPC applications. Therefore, multi-FPGA systems have been emerged to alleviate such resource limitations. In this regard, efficient scheduling strategies are required to dynamically steer the execution of applications—represented as task graphs—on a set of connected FPGAs. In this paper, a heuristic-based dynamic critical path-aware scheduling technique named CPA is presented to schedule task graphs on multi-FPGA systems. The proposed technique, by considering the computation and communication capabilities of FPGAs, dynamically assigns priority to tasks in different steps in order to achieve better makespans. The proposed technique has been evaluated by conducting several experiments on real-world and three different shapes of random task graphs with different number of tasks, and its efficiency has been compared with that of three task graph scheduling approaches. The obtained results demonstrate that the proposed CPA technique outperforms well-known heuristic scheduling strategies and improves their makespan by 13.47% on average. In addition, the experiments show that the proposed technique generates the schedules in the order of milliseconds and the average of its yielded makespans is 12.05% longer than that of an optimum schedule.

  相似文献   

10.
数据和计算密集混合元任务的网格调度算法   总被引:4,自引:0,他引:4  
网格计算技术是继Internet计算之后出现的新兴研究领域。网格系统由异构的资源组成,一个好的任务调度方法可以充分利用网格系统的处理能力,减少任务的完成时间。根据目前网格系统的使用模式,提出了符合实际的用户任务形式,即任务由数据传输和计算两部分组成,计算在获得所有输入之后开始执行。多个这样的独立任务组成元任务,作为调度程序的最小执行单位。在实际应用中,元任务应该由数据密集型和计算密集型任务混合组成。考虑到数据传输和计算的比例关系对元任务完成的影响,提出一种新的调度算法TCR,通过提高计算资源的利用率以及任务间的并行度,减少元任务的完成时间。详细介绍了该算法,并通过模拟结果的对比验证了该算法的良好性能。  相似文献   

11.
科学与工程计算中的很多复杂应用问题需要使用科学工作流技术,超算领域中的科学工作流常以并行任务图建模,并行任务图的有效调度对应用的高效执行有重要意义。给出了资源限制条件下并行任务图的调度模型;针对Fork-Join类并行任务图给出了若干最优化调度结论;针对一般并行任务图提出了一种新的调度算法,该算法考虑了数据通信开销对资源分配和调度性能的影响,并对已有的CPA算法在特定情况下进行了改进。通过实验与常用的CPR和CPA算法做比较,验证了提出的新算法能够获得很好的调度效果。本文提出的调度算法和得到的最优调度结论对工作流应用系统的高性能调度功能开发具有借鉴意义。  相似文献   

12.
Although various strategies have been developed for scheduling parallel applications with independent tasks, very little work exists for scheduling tightly coupled parallel applications on cluster environments. In this paper, we compare four different strategies based on performance models of tightly coupled parallel applications for scheduling the applications on clusters. In addition to algorithms based on existing popular optimization techniques, we also propose a new algorithm called Box Elimination that searches the space of performance model parameters to determine the best schedule of machines. By means of real and simulation experiments, we evaluated the algorithms on single cluster and multi‐cluster setups. We show that our Box Elimination algorithm generates up to 80% more efficient schedules than other algorithms. We also show that the execution times of the schedules produced by our algorithm are more robust against the performance modeling errors. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

13.
To satisfy the high-performance requirements of application executions, many kinds of task scheduling algorithms have been proposed. Among them, duplication-based scheduling algorithms achieve higher performance compared to others. However, because of their greedy feature, they duplicate parents of each task as long as the finish time can be reduced, which leads to a superfluous consumption of resource. However, a large amount of duplications are unnecessary because slight delay of some uncritical tasks does not affect the overall makespan. Moreover, these redundant duplications would occupy the resources, delay the execution of subsequent tasks, and increase the schedule makespan consequently. In this paper, we propose a novel duplication-based algorithm designed to overcome the above drawbacks. The proposed algorithm is to schedule tasks with the least redundant duplications. An optimizing scheme is introduced to search and remove redundancy for a schedule generated by the proposed algorithm further. Randomly generated directed acyclic graphs and two real-world applications are tested in our experiments. Experimental results show that the proposed algorithm can save up to 15.59  % resource consumption compared with the other algorithms. The makespan has improvement as well.  相似文献   

14.
The task scheduling in heterogeneous distributed computing systems plays a crucial role in reducing the makespan and maximizing resource utilization. The diverse nature of the devices in heterogeneous distributed computing systems intensifies the complexity of scheduling the tasks. To overcome this problem, a new list-based static task scheduling algorithm namely Deadline-Aware-Longest-Path-of-all-Predecessors (DA-LPP) is being proposed in this article. In the prioritization phase of the DA-LPP algorithm, the path length of the current task from all its predecessors at each level is computed and among them, the longest path length value is assigned as the rank of the task. This strategy emphasizes the tasks in the critical path. This well-optimized prioritization phase leads to an observable minimization in the makespan of the applications. In the processor selection phase, the DA-LPP algorithm implements the improved insertion-based policy which effectively utilizes the unoccupied leftover free time slots of the processors which improve resource utilization, further least computation cost allocation approach is followed to minimize the overall computation cost of the processors and parental prioritization policy is incorporated to further reduce the scheduling length. To demonstrate the robustness of the proposed algorithm, a synthetic graph generator is used in this experiment to generate a huge variety of graphs. Apart from the synthetic graphs, real-world application graphs like Montage, LIGO, Cybershake, and Epigenomic are also considered to grade the performance of the DA-LPP algorithm. Experimental results of the DA-LPP algorithm show improvement in performance in terms of scheduling length ratio, makespan reduction rate , and resource reduction rate when compared with other algorithms like DQWS, DUCO, DCO and EPRD. The results reveal that for 1000 task set with deadline equals to two times of the critical path, the scheduling length ratio of the DA-LPP algorithm is better than DQWS by 35%, DUCO by 23%, DCO by 26 %, and EPRD by 17%.  相似文献   

15.
Programming with parallel tasks leads to task graphs with dependencies representing a parallel program. Scheduling algorithms are employed to find an efficient execution order of the parallel tasks. A large variety of scheduling algorithms exist, including layer‐based scheduling algorithms for homogeneous target platforms that build consecutive layers of independent parallel tasks and schedule each layer separately. Although these scheduling algorithms provide good results in terms of scheduling algorithm runtime and schedule execution time, the resulting schedules leave room for optimization. This article proposes an optimization for arbitrary layer‐based scheduling algorithms, which is called Move‐blocks algorithm. Given a layer‐based schedule of the parallel tasks, this algorithm moves blocks of parallel tasks into preceding layers in order to reduce the overall execution time of a task‐based application. Suitable blocks of parallel tasks are identified by the algorithm Find‐blocks, which is employed together with the Move‐blocks algorithm. The algorithm Move‐blocks is applied to four well‐known scheduling algorithms. A detailed evaluation for a wide range of test cases is given. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

16.
Power efficiency is one of the main challenges in large-scale distributed systems such as datacenters, Grids, and Clouds. One can study the scheduling of applications in such large-scale distributed systems by representing applications as a set of precedence-constrained tasks and modeling them by a Directed Acyclic Graph. In this paper we address the problem of scheduling a set of tasks with precedence constraints on a heterogeneous set of Computing Resources (CRs) with the dual objective of minimizing the overall makespan and reducing the aggregate power consumption of CRs. Most of the related works in this area use Dynamic Voltage and Frequency Scaling (DVFS) approach to achieve these objectives. However, DVFS requires special hardware support that may not be available on all processors in large-scale distributed systems. In contrast, we propose a novel two-phase solution called PASTA that does not require any special hardware support. In its first phase, it uses a novel algorithm to select a subset of available CRs for running an application that can balance between lower overall power consumption of CRs and shorter makespan of application task schedules. In its second phase, it uses a low-complexity power-aware algorithm that creates a schedule for running application tasks on the selected CRs. We show that the overall time complexity of PASTA is $O(p.v^{2})$ where $p$ is the number of CRs and $v$ is the number of tasks. By using simulative experiments on real-world task graphs, we show that the makespan of schedules produced by PASTA are approximately 20 % longer than the ones produced by the well-known HEFT algorithm. However, the schedules produced by PASTA consume nearly 60 % less energy than those produced by HEFT. Empirical experiments on a physical test-bed confirm the power efficiency of PASTA in comparison with HEFT too.  相似文献   

17.
A compact task graph representation for real-time scheduling   总被引:1,自引:0,他引:1  
A new task graph representation, namely the compact task graph (CTG), is developed to aid in the scheduling of a set of communicating periodic real-time tasks. This representation explicitly expresses the potential for parallelism across tasks as well as the idle times that may be encountered within application tasks. A CTG based scheduler can generate schedules that are able to meet deadlines by interleaving the execution of tasks on a single processor and/or overlapping the execution of tasks on multiple processors. The construction of a CTG is based upon the busy-idle execution profiles for the tasks generated by the compiler. The profiles are computed assuming that sufficient resources are available for parallel execution of all tasks. Thus, they expose all opportunities for overlapped and interleaved execution. The compiler analyzes the profiles to identify useful opportunities for interleaving and expresses them in the CTG without unnecessary partitioning of the tasks. The CTG is powerful because it expresses schedules that are not expressed by existing approaches for constructing task graphs. Schedules can be generated efficiently since a CTG's construction minimizes the splitting of tasks. We briefly demonstrate the usefulness of CTGs in scheduling real-time tasks and scheduling monitoring tasks without affecting the timing of application tasks.Supported in part by the National Science Foundation through a Presidential Young Investigator Award CCR-9157371 and Grant CCR-9212020.  相似文献   

18.
Applications implemented on critical systems are subject to both safety critical and real-time constraints. Classically, applications are specified as precedence task graphs that must be scheduled onto a given target multiprocessor heterogeneous architecture. We propose a new method for simultaneously optimizing two objectives: the execution time and the reliability of the schedule. The problem is decomposed into two successive steps: a spatial allocation during which the reliability is maximized (randomized algorithm), and a scheduling during which the makespan is minimized (list scheduling algorithm). It allows us to produce several trade-off solutions, among which the user can choose the solution that best fits the application’s requirements. Reliability is increased by replicating adequate tasks onto well chosen processors. Our fault model assumes that processors are fail-silent, that they are subject to transient failures, and that the occurrences of failures follow a constant parameter Poisson law. We assess and validate our method by running extensive simulations on both random graphs and actual application graphs. They show that it is competitive, in terms of makespan, compared to existing reference scheduling methods for heterogeneous processors (HEFT), while providing a better reliability.  相似文献   

19.
This paper addresses scheduling problems for tasks with release and execution times. We present a number of efficient and easy to implement algorithms for constructing schedules of minimum makespan when the number of distinct task execution times is fixed. For a set of independent tasks, our algorithm in the single processor case runs in time linear in the number of tasks; with precedence constraints, our algorithm runs in time linear in the sum of the number of tasks and the size of the precedence constraints. In the multi-processor case, our algorithm constructs minimum makespan schedules for independent tasks with uniform execution times. The algorithm runs in O(n log m) time where n is the number of tasks and m is the number of processors. Received September 25, 1997; revised June 11, 1998.  相似文献   

20.
The growth of energy consumption has been explosive in current data centers, super computers, and public cloud systems. This explosion has led to greater advocacy of green computing, and many efforts and works focus on the task scheduling in order to reduce energy dissipation. In order to obtain more energy reduction as well as maintain the quality of service by meeting the deadlines, this paper proposes a DVFS-enabled Energy-efficient Workflow Task Scheduling algorithm: DEWTS. Through merging the relatively inefficient processors by reclaiming the slack time, DEWTS can leverage the useful slack time recurrently after severs are merged. DEWTS firstly calculates the initial scheduling order of all tasks, and obtains the whole makespan and deadline based on Heterogeneous-Earliest-Finish-Time (HEFT) algorithm. Through resorting the processors with their running task number and energy utilization, the underutilized processors can be merged by closing the last node and redistributing the assigned tasks on it. Finally, in the task slacking phase, the tasks can be distributed in the idle slots under a lower voltage and frequency using DVFS technique, without violating the dependency constraints and increasing the slacked makespan. Based on the amount of randomly generated DAGs workflows, the experimental results show that DEWTS can reduce the total power consumption by up to 46.5 % for various parallel applications as well as balance the scheduling performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号