首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There has been a recent increase of interest in heterogeneous computing systems, due partly to the fact that a single parallel architecture may not be adequate for exploiting all of a program's available parallelism. In some cases, heterogeneous systems have been shown to produce higher performance for lower cost than a single large machine. However, there has been only limited work on developing techniques and frameworks for partitioning and scheduling applications across the components of a heterogeneous system. In this paper we propose a general model for describing and evaluating heterogeneous systems that considers the degree of uniformity in the processing elements and the communication channels as a measure of the heterogeneity in the system. We also propose a class of dynamic scheduling algorithms for a heterogeneous computing system interconnected with an arbitrary communication network. These algorithms execute a novel optimization technique to dynamically compute schedules based on the potentially non-uniform computation and communication costs on the processors of a heterogeneous system. A unique aspect of these algorithms is that they easily adapt to different task granularities, to dynamically varying processor and system loads, and to systems with varying degrees of heterogeneity. Our simulations are designed to facilitate the evaluation of different scheduling algorithms under varying degrees of heterogeneity. The results show improved performance for our algorithms compared to the performance resulting from existing scheduling techniques.  相似文献   

2.
With the advent of new routing methods, the distance that a message is sent is becoming relatively less and less important. Thus, assuming no link contention, permutation seems to be an efficient collective communication primitive. In this paper, we present several algorithms for decomposing all-to-many personalized communication into a set of disjoint partial permutations. We discuss several algorithms and study their effectiveness from the view of static scheduling as well as run-time scheduling. An approximate analysis shows that with n processors, and assuming that every processor sends and receives d messages to random destinations, our algorithm can perform the scheduling in O(dn In d) time, on average, and can use an expected number of d+log d partial permutations to carry out the communication. We present experimental results of our algorithms on the CM-5  相似文献   

3.
The multiprocessor scheduling problem is the problem of scheduling the tasks of a precedence constrained task graph (representing a parallel program) onto the processors of a multiprocessor in a way that minimizes the completion time. Since this problem is known to be NP-hard in the strong sense in all but a few very restricted eases, heuristic algorithms are being developed which obtain near optimal schedules in a reasonable amount of computation time. We present an efficient heuristic algorithm for scheduling precedence constrained task graphs with nonnegligible intertask communication onto multiprocessors taking contention in the communication channels into consideration. Our algorithm for obtaining satisfactory suboptimal schedules is based on the classical list scheduling strategy. It simultaneously exploits the schedule-holes generated in the processors and in the communication channels during the scheduling process in order to produce better schedules. We demonstrate the effectiveness of our algorithm by comparing with two competing heuristic algorithms available in the literature  相似文献   

4.
On exploiting task duplication in parallel program scheduling   总被引:1,自引:0,他引:1  
One of the main obstacles in obtaining high performance from message-passing multicomputer systems is the inevitable communication overhead which is incurred when tasks executing on different processors exchange data. Given a task graph, duplication-based scheduling can mitigate this overhead by allocating some of the tasks redundantly on more than one processor. In this paper, we focus on the problem of using duplication in static scheduling of task graphs on parallel and distributed systems. We discuss five previously proposed algorithms and examine their merits and demerits. We describe some of the essential principles for exploiting duplication in a more useful manner and, based on these principles, propose an algorithm which outperforms the previous algorithms. The proposed algorithm generates optimal solutions for a number of task graphs. The algorithm assumes an unbounded number of processors. For scheduling on a bounded number of processors, we propose a second algorithm which controls the degree of duplication according to the number of available processors. The proposed algorithms are analytically and experimentally evaluated and are also compared with the previous algorithms  相似文献   

5.
Effective task scheduling is essential for obtaining high performance in heterogeneous distributed computing systems (HeDCSs). However, finding an effective task schedule in HeDCSs requires the consideration of both the heterogeneity of processors and high interprocessor communication overhead, which results from non-trivial data movement between tasks scheduled on different processors. In this paper, we present a new high-performance scheduling algorithm, called the longest dynamic critical path (LDCP) algorithm, for HeDCSs with a bounded number of processors. The LDCP algorithm is a list-based scheduling algorithm that uses a new attribute to efficiently select tasks for scheduling in HeDCSs. The efficient selection of tasks enables the LDCP algorithm to generate high-quality task schedules in a heterogeneous computing environment. The performance of the LDCP algorithm is compared to two of the best existing scheduling algorithms for HeDCSs: the HEFT and DLS algorithms. The comparison study shows that the LDCP algorithm outperforms the HEFT and DLS algorithms in terms of schedule length and speedup. Moreover, the improvement in performance obtained by the LDCP algorithm over the HEFT and DLS algorithms increases as the inter-task communication cost increases. Therefore, the LDCP algorithm provides a practical solution for scheduling parallel applications with high communication costs in HeDCSs.  相似文献   

6.
科学与工程计算中的很多复杂应用问题需要使用科学工作流技术,超算领域中的科学工作流常以并行任务图建模,并行任务图的有效调度对应用的高效执行有重要意义。给出了资源限制条件下并行任务图的调度模型;针对Fork-Join类并行任务图给出了若干最优化调度结论;针对一般并行任务图提出了一种新的调度算法,该算法考虑了数据通信开销对资源分配和调度性能的影响,并对已有的CPA算法在特定情况下进行了改进。通过实验与常用的CPR和CPA算法做比较,验证了提出的新算法能够获得很好的调度效果。本文提出的调度算法和得到的最优调度结论对工作流应用系统的高性能调度功能开发具有借鉴意义。  相似文献   

7.
On Task Scheduling Accuracy: Evaluation Methodology and Results   总被引:1,自引:1,他引:1  
Many heuristics based on the directed acyclic graph (DAG) have been proposed for the static scheduling problem. Most of these algorithms apply a simple model of the target system that assumes fully connected processors, a dedicated communication sub-system and no contention for the communication resources. Only a few algorithms consider the network topology and the contention for the communication resources. This article evaluates the accuracy of task scheduling algorithms and thus the appropriateness of the applied models. An evaluation methodology is proposed and applied to a representative set of scheduling algorithms. The obtained results show a significant inaccuracy of the produced schedules. Analyzing these results is important for the development of more appropriate models and more accurate scheduling algorithms.  相似文献   

8.
We consider a graph theoretical model and study a parallel implementation of the well-known Gaussian elimination method on parallel distributed memory architectures, where the communication delay for the transmission of an elementary data is higher than the computation time of an elementary instruction. We propose and analyze two low-complexity algorithms for scheduling the tasks of the parallel Gaussian elimination on an unbounded number of completely connected processors. We compare these two algorithms with a higher-complexity general-purpose scheduling algorithm, the DSC heuristic, proposed by A. Gerasoulis and T. Yang (1993)  相似文献   

9.
Contention-aware scheduling with task duplication   总被引:1,自引:0,他引:1  
Finding an efficient schedule for a task graph on several processors is a trade-off between maximising concurrency and minimising interprocessor communication. Task duplication is a technique that has been employed to reduce or avoid interprocessor communication. Certain tasks are duplicated on several processors to produce the data locally and avoid the communication among processors. Most of the algorithms using task duplication have been proposed for the classic scheduling model, which allows concurrent communication and ignores contention for communication resources. It is increasingly recognised that this classic model is unrealistic and does not permit creating accurate and efficient schedules. The recently proposed contention model introduces contention awareness into task scheduling by assigning the edges of the task graph to the links of the communication network. It is intuitive that scheduling under such a model benefits even more from task duplication, yet no such algorithm has been proposed as it is not trivial to duplicate tasks under the contention model. This paper proposes a contention-aware task duplication scheduling algorithm. We investigate the fundamentals for task duplication in the contention model and propose an algorithm that is based on state-of-the-art techniques found in task duplication and contention-aware algorithms. An extensive experimental evaluation demonstrates the significant improvements to the speedup of the produced schedules.  相似文献   

10.
We consider the problem of scheduling tasks on multiprocessor architectures in the presence of communication delays. Given a set of dependent tasks, the scheduling problem is to allocate the tasks to processors such that the pre-specified precedence constraints among the tasks are obeyed and certain cost-measures (such as the computation time) are minimized. Several cases of the scheduling problem have been proven to be NP-complete. Nevertheless, there are polynomial time algorithms for interesting special cases of the general scheduling problem. Most of these results, however, do not take into consideration the delays due to message passing among processors. In this paper we study the increase in time complexity of scheduling problems due to the introduction of communication delays. In particular, we address the open problem of scheduling Out-forests (In-forests) in a multiprocessor system of m identical processors when communication delays are considered. The corresponding problem of scheduling Out-forests (In-forests) without communication delays admits an elegant polynomial time solution as presented first by Hu in 1961; however, the problem in the presence of communication delays has remained unsolved. We present here first known polynomial time algorithms for the computation of the optimal schedule when the number of available processors is given and bounded and both computation and communication delays are assumed to take one unit of time. Furthermore, we present a linear-time algorithm for computing a near-optimal schedule for unit-delay out-forests. The schedule's length exceeds the optimum by no more than (m-2) time units, where m is the number of processors. Hence for two processors the computed schedule is strictly optimum  相似文献   

11.
Scheduling is a fundamental issue in achieving high performance on metacomputers and computational grids. For the first time, the job scheduling problem for grid computing on metacomputers is studied as a combinatorial optimization problem. A cost model is proposed for modeling communication heterogeneity on computational grids. A processor allocation algorithm is developed which always finds an optimal processor allocation that minimizes the effective execution time of a job when the job is being scheduled. It is proven that the list scheduling (LS) algorithm can achieve reasonable worst-case performance bound in grid environments supporting distributed supercomputing with large applications. We compare the performance of various job scheduling and processor allocation algorithms for grid computing on metacomputers. We evaluate the performance of 128 combinations of two job scheduling algorithms, four initial job ordering strategies, four processor allocation algorithms, and four metacomputers by extensive simulation. It is found that the combination of largest job first (LJF) initial job ordering and minimum effective execution time (MEET) or largest machine first (LMF) processor allocation algorithm yields the best average-case performance, and the choice of FCFS and LS depends on the range of job sizes. It is also observed that communication heterogeneity does have significant impact on schedule lengths.  相似文献   

12.
As transistor sizes shrink, interconnects represent an increasing bottleneck for chip designers. Several groups are developing new interconnection methods and system architectures to cope with this trend. New architectures require new methods for high-level application mapping and hardware/software codesign. We present high-level scheduling and interconnect topology synthesis techniques for embedded multiprocessor systems-on-chip that are streamlined for one or more digital signal processing applications. That is, we seek to synthesize an application-specific interconnect topology. We show that flexible interconnect topologies utilizing low-hop communication between processors offer advantages for reduced power and latency. We show that existing multiprocessor scheduling algorithms can deadlock if the topology graph is not strongly connected, or if a constraint is imposed on the maximum number of hops allowed for communication. We detail an efficient algorithm that can be used in conjunction with existing scheduling algorithms for avoiding this deadlock. We show that it is advantageous to perform application scheduling and interconnect synthesis jointly, and present a probabilistic scheduling/interconnect algorithm that utilizes graph isomorphism to pare the design space.  相似文献   

13.
Many approaches have been described for the parallel loop scheduling problem for shared-memory systems, but little work has been done on the data-dependent loop scheduling problem (nested loops with loop carried dependencies). In this paper, we propose a general model for the data-dependent loop scheduling problem on distributed as well as shared memory systems. In order to achieve load balancing and low runtime scheduling and communication overhead, our model is based on a loop task graph and the notion of critical path. In addition, we develop a heuristic algorithm based on our model and on genetic algorithms to test the reliability of the model. We test our approach on different scenarios and benchmarks. The results are very encouraging and suggest a future parallel compiler implementation based on our model.  相似文献   

14.
研究了将无人机作为通信中继平台对战区实施无线通信覆盖时的信息传输调度算法.首先介绍了几种基于信息特征(如优先级、长度、信息在系统总的占用时间等)的调度算法.为了克服传统调度算法的缺点,提出了一种改进的动态优先权调度(DPS,Dynamic Priority Scheduling)算法,将信息的优先级与接受系统服务的时间联系起来,动态调整信息的优先权,从而获得较小的系统平均时延,而且对不同信息又不失“公平性“.最后给出了几种调度算法的仿真结果,并对结果进行了分析.分析表明,动态优先权调度算法是一种比较适合无人机通信中继的实用调度算法.  相似文献   

15.
《Parallel Computing》1997,23(12):1743-1755
This paper addresses the design problems concerning a large-scale, parallel video-on-demand server that consists of multiple clusters of nodes connected by a high performance interconnection network. In order to efficiently control the flow of video streams, we propose two scheduling algorithms for data retrieval and communication. First, we present a disk scheduling algorithm called round scheduling which fully utilizes disk bandwidth, minimizing the disk idle time while the server retrieves data blocks. Second, a communication scheduling algorithm is developed to guarantee conflict-free communication over the multistage interconnection network that is topologically equivalent to the Omega network. We also show some simulation results on the server configuration. Analysis of tradeoffs between the server utilization and the start-up latency helps to determine the proper number and size of server clusters for a set of given nodes.  相似文献   

16.
The paper deals with the scheduling of tasks on a set of parallel machines subject to precedence constraints and communication delays, that are not fully known before the execution. This is usual in parallel computing (message passing) and in production workshops (part transportation). The goal is to give a priori informations on the behavior of different schedules computed before the execution. This sensitivity analysis helps to choose among different available algorithms. It also provides tools to decide whether it is justified to change the schedule structure at execution time (run-time scheduling).By hypothesis, the communication delays are within a given interval. Worst case sensitivity bounds are proposed that give the maximum loss of performance. These bounds hold for algorithms which are optimal in the deterministic case, but can be easily adapted to arbitrary algorithms. Some applications to specific problems are presented, namely two identical machines, and unbounded number of machines, for tree-like precedence constraints.  相似文献   

17.
Many partitioned scientific programs can be modeled as iterative executions of computational tasks and represented by iterative task graphs (ITGs). An ITG may or may not have dependence cycles. In this paper, we consider the symbolic scheduling of ITGs on distributed memory architectures with nonzero communication overhead and propose heuristic algorithms for scheduling both cyclic and acyclic ITGs without searching an entire iteration space. Our approach incorporates techniques of software pipelining, graph unfolding, directed acyclic graph (DAG) scheduling, and load balancing. We analyze the asymptotic optimality of the algorithms to show that the derived schedules are competitive to optimal solutions. We also study the sensitivity of scheduling performance on inaccurate weights. Finally, we present experimental results to demonstrate the effectiveness of the optimization techniques  相似文献   

18.
This paper focuses on a bi-objective experimental evaluation of online scheduling in the Infrastructure as a Service model of Cloud computing regarding income and power consumption objectives. In this model, customers have the choice between different service levels. Each service level is associated with a price per unit of job execution time, and a slack factor that determines the maximal time span to deliver the requested amount of computing resources. The system, via the scheduling algorithms, is responsible to guarantee the corresponding quality of service for all accepted jobs. Since we do not consider any optimistic scheduling approach, a job cannot be accepted if its service guarantee will not be observed assuming that all accepted jobs receive the requested resources. In this article, we analyze several scheduling algorithms with different cloud configurations and workloads, considering the maximization of the provider income and minimization of the total power consumption of a schedule. We distinguish algorithms depending on the type and amount of information they require: knowledge free, energy-aware, and speed-aware. First, to provide effective guidance in choosing a good strategy, we present a joint analysis of two conflicting goals based on the degradation in performance. The study addresses the behavior of each strategy under each metric. We assess the performance of different scheduling algorithms by determining a set of non-dominated solutions that approximate the Pareto optimal set. We use a set coverage metric to compare the scheduling algorithms in terms of Pareto dominance. We claim that a rather simple scheduling approach can provide the best energy and income trade-offs. This scheduling algorithm performs well in different scenarios with a variety of workloads and cloud configurations.  相似文献   

19.
Multiprocessor mapping and scheduling algorithms have been extensively studied over the past few decades and have been tackled from different perspectives. In the late 1980's, the two-step decomposition of scheduling nto clustering and cluster-scheduling - was introduced. Ever since, several clustering and merging algorithms have been proposed and individually reported to be efficient. However, it is not clear how effective they are and how well they compare against single-step scheduling algorithms or other multistep algorithms. In this paper, we explore the effectiveness of the two-phase decomposition of scheduling and describe efficient and novel techniques that aggressively streamline interprocessor communications and can be tuned to exploit the significantly longer compilation time that is available to embedded system designers. We evaluate a number of leading clustering and merging algorithms using a set of benchmarks with diverse structures. We present an experimental setup for comparing the single-step against the two-step scheduling approach. We determine the importance of different steps in scheduling and the effect of different steps on overall schedule performance and show that the decomposition of the scheduling process indeed improves the overall performance. We also show that the quality of the solutions depends on the quality of the clusters generated in the clustering step. Based on the results, we also discuss why the parallel time metric in the clustering step may not provide an accurate measure for the final performance of cluster-scheduling.  相似文献   

20.
现代并行系统的复杂调度问题可以转化为Fork-join图的任务调度问题.然而在实际计算环境中,两个处理节点之间的通信大多以独占方式进行,现有的大多数任务调度算法往往忽略了对通信信道独占性的考虑.提出了一种带通信限制的Fork-join图调度算法CCTD.该算法引入了实际环境中的通信独占性限制,同时保证了Fork-join图的基于复制的优化调度,而且尽可能地减少了对处理器占用.实验结果表明,CCTD算法是一种适应性强的、高效的Fork-join图调度算法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号