期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

章军章立生韩承德《软件学报》1999,10(11):1156-1162

在分布式内存多处理机DMM(distributed memory multiprocessor)系统中,不同处理机上运行的任务之间的通信开销仍然很大,有时甚至抵消了多处理机并行所带来的好处.为了使并行程序在DMM系统上能得以高效的执行,必须采用合理的调度技术将任务分配给处理机.文章首先分别给出了任务调度系统中的任务模型、处理机模型以及调度问题的形式化描述,然后在此基础上研究了任务调度中3个最重要的问题,即(1) 如何顺序选择参与调度的任务,(2) 如何选择路由,(3) 如何分配任务给处理机.其中,路由选择相似文献

2.

A comparison of multiprocessor task scheduling algorithms with communication costs

Reakook Hwang Mitsuo Gen Hiroshi Katayama 《Computers & Operations Research》2008

Both parallel and distributed network environment systems play a vital role in the improvement of high performance computing. Of primary concern when analyzing these systems is multiprocessor task scheduling. Therefore, this paper addresses the challenge of multiprocessor task scheduling parallel programs, represented as directed acyclic task graph (DAG), for execution on multiprocessors with communication costs. Moreover, we investigate an alternative paradigm, where genetic algorithms (GAs) have recently received much attention, which is a class of robust stochastic search algorithms for various combinatorial optimization problems. We design the new encoding mechanism with a multi-functional chromosome that uses the priority representation—the so-called priority-based multi-chromosome (PMC). PMC can efficiently represent a task schedule and assign tasks to processors. The proposed priority-based GA has show effective performance in various parallel environments for scheduling methods. 相似文献

3.

A high performance algorithm for static task scheduling in heterogeneous distributed computing systems 总被引：2，自引：0，他引：2

Mohammad I. DaoudAuthor Vitae Nawwaf Kharma 《Journal of Parallel and Distributed Computing》2008

Effective task scheduling is essential for obtaining high performance in heterogeneous distributed computing systems (HeDCSs). However, finding an effective task schedule in HeDCSs requires the consideration of both the heterogeneity of processors and high interprocessor communication overhead, which results from non-trivial data movement between tasks scheduled on different processors. In this paper, we present a new high-performance scheduling algorithm, called the longest dynamic critical path (LDCP) algorithm, for HeDCSs with a bounded number of processors. The LDCP algorithm is a list-based scheduling algorithm that uses a new attribute to efficiently select tasks for scheduling in HeDCSs. The efficient selection of tasks enables the LDCP algorithm to generate high-quality task schedules in a heterogeneous computing environment. The performance of the LDCP algorithm is compared to two of the best existing scheduling algorithms for HeDCSs: the HEFT and DLS algorithms. The comparison study shows that the LDCP algorithm outperforms the HEFT and DLS algorithms in terms of schedule length and speedup. Moreover, the improvement in performance obtained by the LDCP algorithm over the HEFT and DLS algorithms increases as the inter-task communication cost increases. Therefore, the LDCP algorithm provides a practical solution for scheduling parallel applications with high communication costs in HeDCSs. 相似文献

4.

Space efficient execution of deterministic parallel programs

Simpson D.J. Burton F.W. 《IEEE transactions on pattern analysis and machine intelligence》1999,25(6):870-882

We model a deterministic parallel program by a directed acyclic graph of tasks, where a task can execute as soon as all tasks preceding it have been executed. Each task can allocate or release an arbitrary amount of memory (i.e., heap memory allocation can be modeled). We call a parallel schedule “space efficient” if the amount of memory required is at most equal to the number of processors times the amount of memory required for some depth-first execution of the program by a single processor. We describe a simple, locally depth-first scheduling algorithm and show that it is always space efficient. Since the scheduling algorithm is greedy, it will be within a factor of two of being optimal with respect to time. For the special case of a program having a series-parallel structure, we show how to efficiently compute the worst case memory requirements over all possible depth-first executions of a program. Finally, we show how scheduling can be decentralized, making the approach scalable to a large number of processors when there is sufficient parallelism 相似文献

5.

Optimizing layer‐based scheduling algorithms for parallel tasks with dependencies

R. Kunis G. Rünger 《Concurrency and Computation》2011,23(8):827-849

Programming with parallel tasks leads to task graphs with dependencies representing a parallel program. Scheduling algorithms are employed to find an efficient execution order of the parallel tasks. A large variety of scheduling algorithms exist, including layer‐based scheduling algorithms for homogeneous target platforms that build consecutive layers of independent parallel tasks and schedule each layer separately. Although these scheduling algorithms provide good results in terms of scheduling algorithm runtime and schedule execution time, the resulting schedules leave room for optimization. This article proposes an optimization for arbitrary layer‐based scheduling algorithms, which is called Move‐blocks algorithm. Given a layer‐based schedule of the parallel tasks, this algorithm moves blocks of parallel tasks into preceding layers in order to reduce the overall execution time of a task‐based application. Suitable blocks of parallel tasks are identified by the algorithm Find‐blocks, which is employed together with the Move‐blocks algorithm. The algorithm Move‐blocks is applied to four well‐known scheduling algorithms. A detailed evaluation for a wide range of test cases is given. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

6.

Task clustering and scheduling for distributed memory parallelarchitectures

Palis M.A. Jing-Chiou Liou Wei D.S.L. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(1):46-55

This paper addresses the problem of scheduling parallel programs represented as directed acyclic task graphs for execution on distributed memory parallel architectures. Because of the high communication overhead in existing parallel machines, a crucial step in scheduling is task clustering, the process of coalescing fine grain tasks into single coarser ones so that the overall execution time is minimized. The task clustering problem is NP-hard, even when the number of processors is unbounded and task duplication is allowed. A simple greedy algorithm is presented for this problem which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Indeed, the quality of the schedule improves as the granularity of the task graph becomes larger. For example, if the granularity is at least 1/2, the makespan of the schedule is at most 5/3 times optimal. For a task graph with n tasks and e inter-task communication constraints, the algorithm runs in O(n(n lg n+e)) time, which is n times faster than the currently best known algorithm for this problem. Similar algorithms are developed that produce: (1) optimal schedules for coarse grain graphs; (2) 2-optimal schedules for trees with no task duplication; and (3) optimal schedules for coarse grain trees with no task duplication 相似文献

7.

Power and performance management for parallel computations in clouds and data centers

《Journal of Computer and System Sciences》2016,82(2):174-190

We address scheduling independent and precedence constrained parallel tasks on multiple homogeneous processors in a data center with dynamically variable voltage and speed as combinatorial optimization problems. We consider the problem of minimizing schedule length with energy consumption constraint and the problem of minimizing energy consumption with schedule length constraint on multiple processors. Our approach is to use level-by-level scheduling algorithms to deal with precedence constraints. We use a simple system partitioning and processor allocation scheme, which always schedules as many parallel tasks as possible for simultaneous execution. We use two heuristic algorithms for scheduling independent parallel tasks in the same level, i.e., SIMPLE and GREEDY. We adopt a two-level energy/time/power allocation scheme, namely, optimal energy/time allocation among levels of tasks and equal power supply to tasks in the same level. Our approach results in significant performance improvement compared with previous algorithms in scheduling independent and precedence constrained parallel tasks. 相似文献

8.

On-line scheduling of scalable real-time tasks on multiprocessor systems 总被引：1，自引：0，他引：1

Wan Yeon Lee Sung Je Hong Jong Kim 《Journal of Parallel and Distributed Computing》2003,63(12):1315-1324

The computation time of scalable tasks depends on the number of processors allocated to them in multiprocessor systems. As more processors are allocated to a scalable task, the overall computation time of the task decreases but the total amount of processors’ time devoted to the execution of the task, called workload, increases due to parallel execution overhead. In this paper, we propose a task scheduling algorithm that utilizes the property of scalable tasks for on-line and real-time scheduling. In the proposed algorithm, the total workload of all scheduled tasks is reduced by managing processors allocated to the tasks as few as possible without missing their deadlines. As a result, the processors in the system have less load to execute the scheduled tasks and can execute more newly arriving tasks before their deadlines. Simulation results show that the proposed algorithm performs significantly better than the conventional algorithm based on a fixed number of processors to execute each task. 相似文献

9.

多核平台下XEN虚拟机动态调度算法研究 总被引：1，自引：0，他引：1

黄漾《计算技术与自动化》2014,(3):84-87

虚拟机调度算法对并行任务的执行效率考虑不够充分。现代处理器平台具备了多个可用的计算核心,使多个虚拟机并发执行成为了现实。针对多核平台下的并行虚拟机调度优化问题,提出一种基于任务特征虚拟机CON-Credit调度算法。该算法在调度并行任务时,使用动态方式对计算机核心进行分配,采用传统的虚拟机调度算法为执行普通任务的虚拟机进行分配;采用定制的同步算法给执行并行任务的虚拟机分进分配。相关实验显示,CON-Credit调度算法能显著提高并行任务的执行效率。相似文献

10.

Data‐aware task scheduling on heterogeneous hybrid memory multiprocessor systems

Junjie Chen Kenli Li Zhuo Tang Chubo Liu Yan Wang Keqin Li 《Concurrency and Computation》2016,28(17):4443-4459

In this paper, we propose a method about task scheduling and data assignment on heterogeneous hybrid memory multiprocessor systems for real‐time applications. In a heterogeneous hybrid memory multiprocessor system, an important problem is how to schedule real‐time application tasks to processors and assign data to hybrid memories. The hybrid memory consists of dynamic random access memory and solid state drives when considering the performance of solid state drives into the scheduling policy. To solve this problem, we propose two heuristic algorithms called improvement greedy algorithm and the data assignment according to the task scheduling algorithm, which generate a near‐optimal solution for real‐time applications in polynomial time. We evaluate the performance of our algorithms by comparing them with a greedy algorithm, which is commonly used to solve heterogeneous task scheduling problem. Based on our extensive simulation study, we observe that our algorithms exhibit excellent performance and demonstrate that considering data allocation in task scheduling is significant for saving energy. We conduct experiments on two heterogeneous multiprocessor systems. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

11.

一个调度Fork-Join任务图的新算法 总被引：17，自引：1，他引：16

刘振英方滨兴姜誉张毅赵宏《软件学报》2002,13(4):693-697

任务调度是影响工作站网络效率的关键因素之一.Fork-Join任务图可以代表很多并行结构,但其他已有调度Fork-Join任务图算法忽略了在非全互连工作站网络环境中通信之间不能并行执行的问题,有些效率高的算法又没有考虑节省处理器个数的问题.因此,专门针对该任务图,综合考虑调度长度、非并行通信和节省处理器个数问题,提出了一个基于任务复制的静态调度算法TSA_FJ.通过随机产生任务的执行时间和通信时间,生成了多个Fork-Join任务图,并且采用TSA_FJ算法和其他调度算法对生成的任务图进行调度.结果表明, 相似文献

12.

基于时间Petri网的多处理机的调度算法

韩咚陈波《微机发展》2007,17(6):15-17

任务调度是并行分布式计算机中最有挑战性的问题之一。如何合理有效地进行任务调度将直接影响到系统的并行效率。文中通过将任务图转换为时间petri网的方法,利用求时间petri网的可覆盖树的方法来分析网系统的状态变化和变迁的发生序列,从而求出关键路径和顺序队列。再将该队列分配到处理机上,来缩短相关任务图的调度长度。相似文献

13.

云中多媒体应用中基于混合DAG的最优任务调度研究

郭雅琼宋建新《计算机科学》2015,42(Z11):413-416

云计算的平台优势使得它在多媒体应用中得到广泛使用。由于多媒体服务的多样性和异构性,如何将多媒体任务有效地调度至虚拟机进行处理成为当前多媒体应用的研究重点。对此,研究了云中多媒体最优任务调度问题,首先引入有向无环图来模拟任务中的优先级及任务之间的依赖性,分别对串行、并行、混合结构任务调度模型进行任务调度研究,根据有限资源成本将关键路径中任务节点融合,提出一种实用的启发式近似最优调度方法。实验结果表明,所提调度方法能够以最短的执行时间在有限的资源成本下完成最优的任务分配。相似文献

14.

Graph grammar‐driven parallel partial differential equation solver

Maciej Paszy&#x;ski Robert Schaefer 《Concurrency and Computation》2010,22(9):1063-1097

The paper presents an extension of the composite programmable graph grammar (CP‐graph grammar) suitable for modeling the parallel direct solver algorithm utilized by the hp finite element method (hp‐FEM). In the proposed graph grammar model, the computational mesh is represented by a CP‐graph. The presented graph grammar models the solver algorithm by a set of graph grammar productions. The graph grammar model makes it possible to examine the concurrency of the algorithm by analyzing the interdependence between the atomic tasks, tasks and super‐tasks. The atomic tasks correspond to the graph grammar productions, representing basic undividable parts of the algorithms. The level of atomic tasks models the concurrency for the shared memory architectures. On the other hand, the tasks correspond to the groups of atomic tasks with predefined inter‐task communication channels. They constitute the grain for the decomposition of the parallel algorithm for the distributed memory architecture. Finally, the super‐tasks correspond to a group of tasks resulting from the execution of load balancing algorithm. The solver algorithm is tested on distributed memory linux cluster for up to 192 processors. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

15.

一个调度Out-Tree任务图的启发式算法

下载免费PDF全文

旷文张建军蒋宏《计算机工程与应用》2013,49(12):47-49

任务调度问题是并行分布式计算中的挑战性问题之一。大多数实际的调度算法是启发式的因而常常具有改进的余地。针对Out-Tree任务图这一基本结构提出一个基于任务复制的启发式调度算法,该算法在确保最短调度长度的同时,注重处理器的负载平衡,以达到节约处理器的目的。比较性实验的结果表明,该算法确保了最短调度长度且使用的处理器最少。因而,该算法提高了系统的利用率,避免消耗过多的资源,实际应用性更好。相似文献

16.

一种面向同构集群系统的并行任务节能调度优化方法 总被引：1，自引：0，他引：1

李新贾智平鞠雷赵衍恒宗子良《计算机学报》2012,35(3):591-602

节能调度算法设计是高性能计算领域中的一个研究热点.复制调度算法能够减少后继任务等待延时,缩短任务总体调度时间,但是耗费了更多的能量.为此,作者提出一种启发式处理器合并优化方法 PRO.该方法按照任务最早开始时间和最早结束时间查找处理器时间空隙,将轻负载处理器上的任务重新分配到其它处理器上,从而减少使用的处理器数目,降低系统总体能耗.实验结果表明,和已有的复制任务调度算法TDS、EAD和PEBD相比,优化后的调度算法在不增加调度时间的条件下,能够明显减少使用的处理器数和系统总体能耗,从而更好地实现性能和能耗之间的平衡. 相似文献

17.

Real-Time Atomic Multicast Algorithms Implemented on a Shared Memory Multiprocessor

van der Stok P. D. V. Janssen-Raemaekers A. H. T. 《Real-Time Systems》2003,24(1):55-91

Processes are distributed over processors that share directly accessible memory. Processes exchange messages via memory. Algorithms are specified which satisfy the atomic multicast requirements under two fault hypotheses: (1) fail-stop memory and fail stop processes and (2) fail-stop memory and processes with timing failures. It is shown that the algorithms can be applied under different scheduling conditions: (1) hard real-time (HRT) applications composed of periodically scheduled tasks and (2) HRT applications with tasks scheduled according to a static off-line calculated schedule. The performance measurements on an implementation of two algorithms are presented. 相似文献

18.

FEADS: A Framework for Exploring the Application Design Space on Network Processors

Rajani Pai R. Govindarajan 《International journal of parallel programming》2007,35(1):1-31

Network processors are designed to handle the inherently parallel nature of network processing applications. However, partitioning and scheduling of application tasks and data allocation to reduce memory contention remain as major challenges in realizing the full performance potential of a given network processor. The large variety of processor architectures in use and the increasing complexity of network applications further aggravate the problem. This work proposes a novel framework, called FEADS, for automating the task of application partitioning and scheduling for network processors. FEADS uses the simulated annealing approach to perform design space exploration of application mapping onto processor resources. Further, it uses cyclic and r-periodic scheduling to achieve higher throughput schedules. To evaluate dynamic performance metrics such as throughput and resource utilization under realistic workloads, FEADS automatically generates a Petri net (PN) which models the application, architectural resources, mapping and the constructed schedule and their interaction. The throughput obtained by schedules constructed by FEADS is comparable to that obtained by manual scheduling for linear task flow graphs; for more complicated task graphs, FEADS’ schedules have a throughput which is upto 2.5 times higher compared to the manual schedules. Further, static scheduling of tasks results in an increase in throughput by upto 30% compared to an implementation of the same mapping without task scheduling. 相似文献

19.

基于多核处理器的关联任务并行感知调度算法

梁秋玲张向利张红梅闫坤《计算机工程》2021,47(7):212-217

关联任务在多核处理器上并行调度所产生的通信时延,会对任务调度长度和处理器利用率造成负面影响,为了改善多核系统对关联任务的处理性能,针对关联任务在多核处理器上的调度特点,提出一种并行感知调度算法。计算各任务与终点间的最长路径值,按照该值的降序来分配任务调度次序,在分配处理器内核时兼顾关联度和任务最早可执行时间,设置最佳匹配评价函数。实验结果表明,与busHEFT和DTSV算法相比,该算法具有更短的任务调度时延、更少的通信量以及更高的处理器利用率。相似文献

20.

一种基于多处理器任务复制的分簇调度算法 总被引：2，自引：1，他引：1

徐成赵林祥杨志邦《计算机应用研究》2012,29(8):2931-2934

任务调度的优劣是决定并行分布式计算机系统性能好坏的重要因素之一。为优化任务调度,基于一些典型算法(如LG、PPA算法等),提出了一种新的任务调度算法。该算法一方面复制满足条件的前驱任务来缩短调度长度;另一方面合理地复制其他前驱任务和合并冗余簇来减少所需处理器的数目。实验表明,该算法在调度长度和所需处理器的数目上优于以上典型算法,并具有更小的时间复杂度,对并行计算机系统性能的提升具有一定的意义。相似文献