首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper improves lower bounds on the minimum number of processors and minimum time to execute a given directed acyclic task graph with communication delays. The given task graph is partitioned into a number of independent task graphs to arrive at sharper bounds. A drawback of the earlier best method is pointed out and corrected. The time taken to calculate the bounds is also less in comparison to the corrected method. These bounds are useful in comparing heuristic algorithms used for scheduling directed acyclic task graphs and as compiler tools in static scheduling-on parallel computers.  相似文献   

2.
A lower bound on the number of processors and finish time for the problem of scheduling precedence graphs with communication costs is presented. The notion of the earliest starting time of a task is formulated for the context of lower bounds. A lower bound on the completion time is proposed. A task delay which does not increase the earliest completion time of a schedule is defined. Each task can then be scheduled within a time interval without affecting the lower bound performance on the finish time. This leads to definition of a new lower bound on the number of processors required to process the task graph. A derivation of the minimum time increase over the earliest completion time is also proposed for the case of a smaller number of processors. A lower bound on the minimum number of interprocessor communication links required to achieve optimum performance is proposed. Evaluation had been carried out by using a set of 360 small graphs. The bound on the finish time deviates at most by 5% from the optimum solution in 96% of the cases and performs well with respect to the minimum number of processors and communication links  相似文献   

3.
Stochastic bounds are obtained on execution times of parallel programs when the number of processors is unlimited. A parallel program is considered to consist of interdependent tasks with synchronization constraints. These constraints are described by an acyclic directed graph called a task graph. The execution times of tasks are considered to be independently identically distributed (i.i.d.) random variables. The performance measure of interest is the overall execution of the considered parallel program (task graph). Stochastic bound methods are applied to obtain lower and upper bounds on this measure. Another upper bound is obtained for parallel programs having `new better than used in expectation' (NBUE) random variables as task execution times. NBUE random variables are replaced with exponential random variables of the same mean to derive this upper bound  相似文献   

4.
Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application tasks with dependences. These applications exhibit both task and data parallelism, and combining these two (also called mixed parallelism) has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task and data parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the task graph, the runtime estimates and scalability characteristics of the tasks, and the intertask data communication volumes. A locality-conscious scheduling strategy is used to improve intertask data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications and synthetic graphs shows that our algorithm consistently generates schedules with a lower makespan as compared to Critical Path Reduction (CPR) and Critical Path and Allocation (CPA), two previously proposed scheduling algorithms. Our algorithm also produces schedules that have a lower makespan than pure task- and data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.  相似文献   

5.
Abstract. We study the average number of delays suffered by packets routed using greedy (work conserving) scheduling policies. We obtain tight bounds on the worst-case average number of delays in a few cases as follows. First, we show that the worst-case average number of delays is a function of the number of sources of packets, which is interesting in case a node may send many packets. Then, using a new concept we call delay race , we prove a tight bound on the average number of delays in a leveled graph. Finally, using delay races in a more involved way, we prove nearly tight bounds on the average number of delays in directed acyclic graphs (DAGs). The upper bound for DAGs is expressed in terms of the underlying topology, and as a result it holds for any acyclic set of routes, even if they are not shortest paths. The lower bound for DAGs, on the other hand, holds even for shortest paths routes.  相似文献   

6.
Multi-core real-time scheduling for generalized parallel task models   总被引:1,自引:0,他引:1  
Multi-core processors offer a significant performance increase over single-core processors. They have the potential to enable computation-intensive real-time applications with stringent timing constraints that cannot be met on traditional single-core processors. However, most results in traditional multiprocessor real-time scheduling are limited to sequential programming models and ignore intra-task parallelism. In this paper, we address the problem of scheduling periodic parallel tasks with implicit deadlines on multi-core processors. We first consider a synchronous task model where each task consists of segments, each segment having an arbitrary number of parallel threads that synchronize at the end of the segment. We propose a new task decomposition method that decomposes each parallel task into a set of sequential tasks. We prove that our task decomposition achieves a resource augmentation bound of 4 and 5 when the decomposed tasks are scheduled using global EDF and partitioned deadline monotonic scheduling, respectively. Finally, we extend our analysis to a directed acyclic graph (DAG) task model where each node in the DAG has a unit execution requirement. We show how these tasks can be converted into synchronous tasks such that the same decomposition can be applied and the same augmentation bounds hold. Simulations based on synthetic workload demonstrate that the derived resource augmentation bounds are safe and sufficient.  相似文献   

7.
Code-profiling is the process of determining the types of codes found in a given heterogeneous task. Once this information is available, it is desirable to know how many processors are needed for each of the code types. In this paper, we propose two methods for estimating the minimum number of processors required for each of these code types. The first method involves making use of task compatibility graphs. We show that a task compatibility graph can be generated by analyzing certain compatible relations between task module pairs of a given task flow graph. We define the resource (processor) minimization problem therefore to be equivalent to finding the minimal number of cliques that cover the task compatibility graph, or to finding the minimal number of colors that color the vertices of its complement graph, called task conflict graph. We solve this problem using a greedy approach in O(¦V¦log¦V¦¦E¦) time, where ¦V¦ and ¦E¦ are the number of vertices and edges of the task compatibility graph. We further show that for three special types of task compatibility graphs, optimal solution can be obtained in polynomial time. The second method studied in this paper uses the Cluster-M methodology for estimating the minimum number of processors. Examples are shown to compare the estimated results obtained using different techniques.  相似文献   

8.
Abstract

Minimizing the amount of time and number of processors needed to perform an application reduces the application's fabrication cost and operation costs. A directed acyclic graph (dag) model of algorithms is used to define a time-minimal schedule and a processor-time-minimal schedule, We present a technique for finding a lower bound on the number of processors needed to achieve a given schedule of an algorithm. The application of this technique is illustrated with a tensor product computation. We then apply the technique to the free schedule of algorithms for matrix product, Gaussian elimination, and transitive closure. For each, we provide a time-minimal processor schedule that meets these processor lower bounds, including the one for tensor product.  相似文献   

9.
A practical join processing strategy that allows effective utilization of arbitrary degrees of parallelism in both the I/O subsystem and join processing subsystems is presented. Analytic bounds on the minimum execution time, minimum number of processors, and processor utilization are presented along with bounds on the execution time, given a fixed number of processors. These bounds assume that sufficient buffers are available. An analytic lower bound on buffer requirements as well as a practical heuristic for use in limited buffer environments are also presented. A sampling of corroborative simulation results are included  相似文献   

10.
This paper determines upper bounds on the expected time complexity for a variety of parallel algorithms for undirected and directed random graph problems. For connectivity, biconnectivity, transitive closure, minimum spanning trees, and all pairs minimum cost paths, we prove the expected time to beO(log logn) for the CRCW PRAM (this parallel RAM machine allows resolution of write conflicts) andO(logn · log logn) for the CREW PRAM (which allows simultaneous reads but not simultaneous writes). We also show that the problem of graph isomorphism has expected parallel timeO(log logn) for the CRCW PRAM andO(logn) for the CREW PRAM. Most of these results follow because of upper bounds on the mean depth of a graph, derived in this paper, for more general graphs than was known before. For undirected connectivity especially, we present a new probabilistic algorithm which runs on a randomized input and has an expected running time ofO(log logn) on the CRCW PRAM, withO(n) expected number of processors only. Our results also improve known upper bounds on the expected space required for sequential graph algorithms. For example, we show that the problems of finding connected components, transitive closure, minimum spanning trees, and minimum cost paths have expected sequential spaceO(logn · log logn) on a deterministic Turing Machine. We use a simulation of the CRCW PRAM to get these expected sequential space bounds.  相似文献   

11.
Regular arrays of processing elements in VLSI have proved to be suitable for high-speed execution of many matrix operations. To execute an arbitrary computational algorithm on such processing arrays, it has been suggested mapping the given algorithm directly onto a regular array. The computational algorithm is represented by a data-flow graph whose nodes are to be mapped onto processors in the VLSI array. This study examines the complexity of mapping data-flow graphs onto square and hexagonal arrays of processors. We specifically consider the problem of routing data from processors in a given (source) sequence to another (target) sequence. We show that under certain conditions, the above problem is equivalent to the one of finding a minimum-diameter cyclic arrangement. The complexity of the latter problem is analyzed and upper and lower bounds on the number of intermediate rows of processors (between the source and target rows) are derived.  相似文献   

12.
In a majority conversion process, the vertices of a graph can be in one of the two states, colored or uncolored, and these states are dynamically updated so that a vertex becomes colored at a certain time period if at least half of its neighbors were in the colored state in the previous time period. A dynamic monopoly is a set of vertices in a graph that when initially colored will eventually cause all vertices in the graph to become colored. This paper establishes a connection between dynamic monopolies and the well-known feedback vertex sets which are sets of vertices whose removal results in an acyclic graph. More specifically, we show that dynamic monopolies and feedback vertex sets are equivalent in graphs wherein all vertices have degree 2 or 3. We use this equivalence to provide exact values for the minimum size of dynamic monopolies of planar hexagonal grids, as well as upper and lower bounds on the minimum size of dynamic monopolies of cylindrical and toroidal hexagonal grids. For these last two topologies, the respective upper and lower bounds differ by at most one.  相似文献   

13.
In this paper we consider the symbolic scheduling of partitioned loop programs which are modeled as iterative task graphs (ITGs). Each task in such a graph is coarse grained and contains a large chunk of computations. The weights of computation and communication vary from one iteration to another depending on the index value of the loop. The goal of scheduling such graphs is to incorporate the symbolic variables in weight functions and loop bounds and provide an asymptotically optimal schedule and predict its performance accurately. We provide a lower bound for optimal scheduling when weights of iterative task graphs change monotonically in the course of iterations and there is a sufficient number of processors. We present a technique that devises a valid symbolic schedule without searching all task instances and examine the asymptotic performance of this schedule compared to an optimal solution. Finally, we present case studies and experimental results on nCUBE-2 to verify our solutions.  相似文献   

14.
On the granularity and clustering of directed acyclic task graphs   总被引:1,自引:0,他引:1  
The authors consider the impact of the granularity on scheduling task graphs. Scheduling consists of two parts, the processors assignment of tasks, also called clustering, and the ordering of tasks for execution in each processor. The authors introduce two types of clusterings: nonlinear and linear clusterings. A clustering is nonlinear if two parallel tasks are mapped in the same cluster otherwise it is linear. Linear clustering fully exploits the natural parallelism of a given directed acyclic task graph (DAG) while nonlinear clustering sequentializes independent tasks to reduce parallelism. The authors also introduce a new quantification of the granularity of a DAG and define a coarse grain DAG as the one whose granularity is greater than one. It is proved that every nonlinear clustering of a coarse grain DAG can be transformed into a linear clustering that has less or equal parallel time than the nonlinear one. This result is used to prove the optimality of some important linear clusterings used in parallel numerical computing  相似文献   

15.
This paper addresses the problem of scheduling parallel programs represented as directed acyclic task graphs for execution on distributed memory parallel architectures. Because of the high communication overhead in existing parallel machines, a crucial step in scheduling is task clustering, the process of coalescing fine grain tasks into single coarser ones so that the overall execution time is minimized. The task clustering problem is NP-hard, even when the number of processors is unbounded and task duplication is allowed. A simple greedy algorithm is presented for this problem which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Indeed, the quality of the schedule improves as the granularity of the task graph becomes larger. For example, if the granularity is at least 1/2, the makespan of the schedule is at most 5/3 times optimal. For a task graph with n tasks and e inter-task communication constraints, the algorithm runs in O(n(n lg n+e)) time, which is n times faster than the currently best known algorithm for this problem. Similar algorithms are developed that produce: (1) optimal schedules for coarse grain graphs; (2) 2-optimal schedules for trees with no task duplication; and (3) optimal schedules for coarse grain trees with no task duplication  相似文献   

16.
When executing processes on parallel computer systems a major bottle-neck is interprocessor communication. One way to address this problem is to minimize the communication between processes that are mapped to different processors. This translates to the k-partitioning problem of the corresponding process graph, where k is the number of processors. The classical spectral lower bound of (|V|/2k)\sum k i=1λ i for the k-section width of a graph is well known. We show new relations between the structure and the eigenvalues of a graph and present a new method to get tighter lower bounds on the k-section width. This method makes use of the level structure defined by the k-section. We define a global expansion property and prove that for graphs with the same k-section width the spectral lower bound increases with this global expansion. We also present examples of graphs for which our new bounds are tight up to a constant factor.  相似文献   

17.
This paper determines upper bounds on the expected time complexity for a variety of parallel algorithms for undirected and directed random graph problems. For connectivity, biconnectivity, transitive closure, minimum spanning trees, and all pairs minimum cost paths, we prove the expected time to beO(log logn) for the CRCW PRAM (this parallel RAM machine allows resolution of write conflicts) andO(logn · log logn) for the CREW PRAM (which allows simultaneous reads but not simultaneous writes). We also show that the problem of graph isomorphism has expected parallel timeO(log logn) for the CRCW PRAM andO(logn) for the CREW PRAM. Most of these results follow because of upper bounds on the mean depth of a graph, derived in this paper, for more general graphs than was known before.For undirected connectivity especially, we present a new probabilistic algorithm which runs on a randomized input and has an expected running time ofO(log logn) on the CRCW PRAM, withO(n) expected number of processors only.Our results also improve known upper bounds on the expected space required for sequential graph algorithms. For example, we show that the problems of finding connected components, transitive closure, minimum spanning trees, and minimum cost paths have expected sequential spaceO(logn · log logn) on a deterministic Turing Machine. We use a simulation of the CRCW PRAM to get these expected sequential space bounds.This research was supported by National Science Foundation Grant DCR-85-03251 and Office of Naval Research Contract N00014-80-C-0647.This research was partially supported by the National Science Foundation Grants MCS-83-00630, DCR-8503497, by the Greek Ministry of Research and Technology, and by the ESPRIT Basic Research Actions Project ALCOM.  相似文献   

18.
Improving scheduling of tasks in a heterogeneous environment   总被引:1,自引:0,他引:1  
Optimal scheduling of parallel tasks with some precedence relationship, onto a parallel machine is known to be NP-complete. The complexity of the problem increases when task scheduling is to be done in a heterogeneous environment, where the processors in the network may not be identical and take different amounts of time to execute the same task. We introduce a task duplication-based scheduling algorithm for network of heterogeneous systems (TANH), with complexity O(V/sup 2/), which provides optimal results for applications represented by directed acyclic graphs (DAGs), provided a simple set of conditions on task computation and network communication time could be satisfied. The performance of the algorithm is illustrated by comparing the scheduling time with an existing "best imaginary level scheduling (BIL)" scheme for heterogeneous systems. The scalability for a higher or lower number of processors, as per their availability is also discussed. We have shown to provide substantial improvement over existing work on the task duplication-based scheduling algorithm (TDS).  相似文献   

19.
The communication overhead is a major bottleneck for the execution of a process graph on a parallel computer system. In the case of two processors, the minimization of the communication can be modeled using the graph bisection problem. The spectral lower bound of λ2|V|/4 for the bisection width of a graph is widely known. The bisection width is equal to λ2|V|/4 iff all vertices are incident to λ2/2 cut edges in every optimal bisection.

We present a new method of obtaining tighter lower bounds on the bisection width. This method makes use of the level structure defined by the bisection. We define some global expansion properties and we show that the spectral lower bound increases with this global expansion. Under certain conditions we obtain a lower bound depending on λ2β|V| with . We also present examples of graphs for which our new bounds are tight up to a constant factor. As a by-product, we derive new lower bounds for the bisection widths of 3- and 4-regular Ramanujan graphs.  相似文献   


20.
The problem of unification of terms is log-space complete for P. In deriving this lower bound no use is made of the potentially concise representation of terms by directed acyclic graphs. In addition, the problem remains complete even if infinite substitutions are allowed. A consequence of this result is that parallelism cannot significantly improve on the best sequential solutions for unification. However, we show that for the problem of term matching, an important subcase of unification, there is a good parallel algorithm using O(log2n) time and nO(1) processors on a PRAM. For the O(log2n) parallel time upper bound we assume that the terms are represented by directed acyclic graphs; if the longer string representation is used we obtain an O(log n) parallel time bound.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号