首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Assume that a real-time programP T consisting of a number of parallel processes is executed on a system having a setPr of processors which are shared between the processes by a real-time schedulerS T. Assume that PT must meet some timing deadlines. We show that such an implementation ofP T can be represented as a transformationL(P T) and that the deadlines ofP T will be met if they are satisfied by the timing properties of the transformed program. The condition for feasibility of a real-time program executed under a scheduler is formalized and rules are provided for verification. The schedulerS T can be specifiedgenerically and applied to different programs, making it unnecessary to introduce low-level operations such as scheduling primitives into the programming language. Thus real-time program specification and Schedulability can be considered in the same framework and the timing properties of a program can be determined at the specification level. By separating the specification of the scheduler from that of the program, the feasibility of an implementation can be proved by considering a scheduling policy rather than its implementation details.  相似文献   

2.
Cluster scheduling, where processors are grouped into clusters and the tasks that are allocated to one cluster are scheduled by a global scheduler, has attracted attention in multiprocessor real-time systems research recently. In this paper, assuming that an optimal global scheduler is adopted within each cluster, we investigate the worst-case utilization bounds for cluster scheduling with different task allocation/partitioning heuristics. First, we develop a lower limit on the utilization bounds for cluster scheduling with any reasonable task allocation scheme. Then, the lower limit is shown to be the exact utilization bound for cluster scheduling with the worst-fit task allocation scheme. For other task allocation heuristics (such as first-fit, best-fit, first-fit decreasing, best-fit decreasing and worst-fit decreasing), higher utilization bounds are derived for systems with both homogeneous clusters (where each cluster has the same number of processors) and heterogeneous clusters (where clusters have different number of processors). In addition, focusing on an efficient optimal global scheduler, namely the boundary-fair (Bfair) algorithm, we propose a period-aware task allocation heuristic with the goal of reducing the scheduling overhead (e.g., the number of scheduling points, context switches and task migrations). Simulation results indicate that the percentage of task sets that can be scheduled is significantly improved under cluster scheduling even for small-size clusters, compared to that of the partitioned scheduling. Moreover, when comparing to the simple generic task allocation scheme (e.g., first-fit), the proposed period-aware task allocation heuristic markedly reduces the scheduling overhead of cluster scheduling with the Bfair scheduler.  相似文献   

3.
The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper we present a new asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. For a nested-parallel program with depth D and serial space requirement S 1 , we show that the expected space requirement is S 1 + O(K⋅ p⋅ D) on p processors. Here, K is a user-adjustable runtime parameter, which provides a tradeoff between running time and space requirement. Our algorithm achieves good locality and low scheduling overheads by automatically increasing the granularity of the work scheduled on each processor. We have implemented the new scheduling algorithm in the context of a native, user-level implementation of Posix standard threads or Pthreads, and evaluated its performance using a set of C-based benchmarks that have dynamic or irregular parallelism. We compare the performance of our scheduler with that of two previous schedulers: the thread library's original scheduler (which uses a FIFO queue), and a provably space-efficient depth-first scheduler. At a fine thread granularity, our scheduler outperforms both these previous schedulers, but requires marginally more memory than the depth-first scheduler. We also present simulation results on synthetic benchmarks to compare our scheduler with space-efficient versions of both a work-stealing scheduler and a depth-first scheduler. The results indicate that unlike these previous approaches, the new algorithm covers a range of scheduling granularities and space requirements, and allows the user to trade the space requirement of a program with the scheduling granularity. Received June 11, 2000, and in revised form March 19, 2001, and in final form August 7, 2001. Online publication April 5, 2002.  相似文献   

4.
We study online adaptive scheduling for multiple sets of parallel jobs, where each set may contain one or more jobs with time-varying parallelism. This two-level scheduling scenario arises naturally when multiple parallel applications are submitted by different users or user groups in large parallel systems, where both user-level fairness and system-wide efficiency are of important concerns. To achieve fairness, we use the well-known equi-partitioning algorithm to distribute the available processors among the active job sets at any time. For efficiency, we apply a feedback-driven adaptive scheduler that periodically adjusts the processor allocations within each set by consciously exploiting the jobs’ execution history. We show that our algorithm achieves asymptotically competitive performance with respect to the set response time, which incorporates two widely used performance metrics, namely, total response time and makespan, as special cases. Both theoretical analysis and simulation results demonstrate that our algorithm improves upon an existing scheduler that provides only fairness but lacks efficiency. Furthermore, we provide a generalized framework for analyzing a family of scheduling algorithms based on feedback-driven policies with provable efficiency. Finally, we consider an extended multi-level hierarchical scheduling model and present a fair and efficient solution that effectively reduces the problem to the two-level model.  相似文献   

5.
The aim of this paper is to evaluate OpenMP, TBB and Cilk Plus as basic language-based tools for simple and efficient parallelization of recursively defined computational problems and other problems that need both task and data parallelization techniques. We show how to use these models of parallel programming to transform a source code of Adaptive Simpson’s Integration to programs that can utilize multiple cores of modern processors. Using the example of Belman–Ford algorithm for solving single-source shortest path problems, we advise how to improve performance of data parallel algorithms by tuning data structures for better utilization of vector extensions of modern processors. Manual vectorization techniques based on Cilk array notation and intrinsics are presented. We also show how to simplify such optimization using Intel SIMD Data Layout Template containers.  相似文献   

6.
This paper addresses the problem of minimizing the scheduling length (make-span) of a batch of jobs with different arrival times. A job is described by a direct acyclic graph (DAG) of parallel tasks. The paper proposes a dynamic scheduling method that adapts the schedule when new jobs are submitted and that may change the processors assigned to a job during its execution. The scheduling method is divided into a scheduling strategy and a scheduling algorithm. We also propose an adaptation of the Heterogeneous Earliest-Finish-Time (HEFT) algorithm, called here P-HEFT, to handle parallel tasks in heterogeneous clusters with good efficiency without compromising the makespan. The results of a comparison of this algorithm with another DAG scheduler using a simulation of several machine configurations and job types shows that P-HEFT gives a shorter makespan for a single DAG but scores worse for multiple DAGs. Finally, the results of the dynamic scheduling of a batch of jobs using the proposed scheduler method showed significant improvements for more heavily loaded machines when compared to the alternative resource reservation approach.  相似文献   

7.
We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk work-stealing scheduler on a machine with dag-consistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs cache misses when executed by the Cilk scheduler on a machine with P processors, each with a cache of size Z, with high probability. This bound is tighter than previously published bounds. We also present a new multithreaded cache oblivious algorithm for 1D stencil computations incurring cache misses with high probability, one for Gaussian elimination and back substitution, and one for the length computation part of the longest common subsequence problem incurring cache misses with high probability. This work was supported in part by the Defense Advanced Research Projects Agency (DARPA) under contract No. NBCH30390004.  相似文献   

8.
Multiprocessor scheduling in a shared multiprogramming environment can be structured in two levels, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler maps the ready threads of a job onto the allotted processors. We present two provably-efficient two-level scheduling schemes called G-RAD and S-RAD respectively. Both schemes use the same job scheduler RAD for the processor allotments that ensures fair allocation under all levels of workload. In G-RAD, RAD is combined with a greedy thread scheduler suitable for centralized scheduling; in S-RAD, RAD is combined with a work-stealing thread scheduler more suitable for distributed settings. Both G-RAD and S-RAD are non-clairvoyant. Moreover, they provide effective control over the scheduling overhead and ensure efficient utilization of processors. We also analyze the competitiveness of both G-RAD and S-RAD with respect to an optimal clairvoyant scheduler. In terms of makespan, both schemes can achieve O(1)-competitiveness for any set of jobs with arbitrary release time. In terms of mean response time, both schemes are O(1)-competitive for arbitrary batched jobs. To the best of our knowledge, G-RAD and S-RAD are the first non-clairvoyant scheduling algorithms that guarantee provable efficiency, fairness and minimal overhead.  相似文献   

9.
In this paper, we study an online scheduling problem with moldable parallel tasks on m processors. Each moldable task can be processed simultaneously on any number of processors of a parallel computer, and the processing time of a moldable task depends on the number of processors allotted to it. Tasks arrive one by one. Upon arrival of each task, the scheduler has to determine both the number of processors and the starting time for the task. Moreover, these decisions cannot be changed in the future. The objective is to attain a schedule such that the longest completion time over all tasks, i.e., the makespan, is minimized. First, we provide a general framework to show that any \(\rho \)-bounded algorithm for scheduling of rigid parallel tasks (the number of processors for a task is fixed a prior) can be extended to yield an algorithm for scheduling of moldable tasks with a competitive ratio of \(4\rho \) if the ratio \(\rho \) is known beforehand. As a consequence, we achieve the first constant competitive ratio, 26.65, for the moldable parallel tasks scheduling problem. Next, we provide an improved algorithm with a competitive ratio of at most 16.74.  相似文献   

10.
Assembling and simultaneously using different types of distributed computing infrastructures (DCI) like Grids and Clouds is an increasingly common situation. Because infrastructures are characterized by different attributes such as price, performance, trust, and greenness, the task scheduling problem becomes more complex and challenging. In this paper we present the design for a fault-tolerant and trust-aware scheduler, which allows to execute Bag-of-Tasks applications on elastic and hybrid DCI, following user-defined scheduling strategies. Our approach, named Promethee scheduler, combines a pull-based scheduler with multi-criteria Promethee decision making algorithm. Because multi-criteria scheduling leads to the multiplication of the possible scheduling strategies, we propose SOFT, a methodology that allows to find the optimal scheduling strategies given a set of application requirements. The validation of this method is performed with a simulator that fully implements the Promethee scheduler and recreates an hybrid DCI environment including Internet Desktop Grid, Cloud and Best Effort Grid based on real failure traces. A set of experiments shows that the Promethee scheduler is able to maximize user satisfaction expressed accordingly to three distinct criteria: price, expected completion time and trust, while maximizing the infrastructure useful employment from the resources owner point of view. Finally, we present an optimization which bounds the computation time of the Promethee algorithm, making realistic the possible integration of the scheduler to a wide range of resource management software.  相似文献   

11.
一种基于资源预分配的虚拟机软实时调度方法   总被引:1,自引:0,他引:1       下载免费PDF全文
虚拟机技术作为云计算的重要技术之一,近年来得到广泛关注,但是由于虚拟机管理层的存在,导致语义鸿沟,使得实时应用程序、并发程序等在虚拟机上的运行性能受到影响。分析和研究了Xen虚拟机管理器的Credit调度算法,针对其在并发调度和软实时调度方面存在的不足,提出了改进调度算法,实现了算法的调度器原型。新的调度算法对软实时虚拟机进行Credit比例预分配,采用动态调度时间片机制,以non-work-conserving方式实现软实时任务周期调度,保障调度周期满足运行周期要求。通过区分并发和非并发软实时虚拟机,采取不同的调度策略,在满足资源利用率的基础上,确保实时任务的顺利运行。测试结果表明,该调度算法在对并发和非并发软实时任务调度上,具有良好的表现,较好满足了软实时应用调度需求。  相似文献   

12.
一种面向多核系统的并行计算任务分配方法   总被引:2,自引:0,他引:2  
随着多核处理器的普及,目前的大规模并行处理系统普遍采用多核处理器,这对于资源管理和调度提出了更高的要求.提出了基于共享Cache资源划分的方法,建立了面向多核处理器支持Cache资源分配的进程调度模型,设计并实现了并行任务到多核处理器的映射算法,更好地解决了大规模资源管理系统中面向多核处理器的任务分配问题,降低了使用共享Cache的多个进程运行时的相互干扰,提升了应用程序性能.  相似文献   

13.
《Parallel Computing》2004,30(5-6):567-583
Recent breakthroughs in the mathematical estimation of parallel genetic algorithm parameters are applied to the NP-complete problem of scheduling multiple tasks on a cluster of computers connected by a shared bus. Numerous adjustments to the original method of parameter estimation were made in order to accurately reflect differences in the problem model. The parallel scheduler used m-ary encoding and included a shared communication bus constraint. Fitness was an indirect computation requiring an evaluation of the meaning and implications (i.e., effect on communication time) of the encoding. The degree of correctness was defined as the “nearness” to the optimal schedule that could be obtained in a limited amount of time. Experiments reveal that the parallel scheduling algorithm developed very accurate schedules when the modified parameter guidelines were used. This article describes the scheduling problem, the parallel genetic scheduler, the adjustments made to the mathematical estimations, the quality of the schedules that were obtained, and the accuracy of the schedules compared to mathematically predicted expected values.  相似文献   

14.
Oh  Dong-Ik  Bakker  T.P. 《Real-Time Systems》1998,15(2):183-192
We consider the schedulability of a set of independent periodic tasks under fixed priority preemptive scheduling on homogeneous multiprocessor systems. Assuming there is no task migration between processors and each processor schedules tasks preemptively according to fixed priorities assigned by the Rate Monotonic policy, the scheduling problem reduces to assigning the set of tasks to disjoint processors in such a way that the Monotonic policy, the scheduling problem reduces to assigning the set of tasks to disjoint processors in such a way that the schedulability of the tasks on each processor can be guaranteed. In this paper we show that the worst case achievable utilization for such systems is between n(21/2-1) and (n+1)/(1+21/(n+1)), where n stands for the number of processors. The lower bound represents 41 percent of the total system capacity and the upper bound represents 50 to 66 percent depending on n. Practicality of the lower bound is demonstrated by proving it can be achieved using a First Fit scheduling algorithm.  相似文献   

15.
The problem of exploiting the effective utilization of a multiprocessor system essentially depends on optimal scheduling of parallel tasks onto the processors in the system. A recently proposed compile-time scheduling algorithm based on the 0–1 linear programming algorithm with the branch and bound technique, to produce optimal schedules, has the problems of communication link contention, nonoptimality, and modeling incompletely connected processor systems. In this paper, we present a modified version of this algorithm for producing contention-free optimal schedules for any arbitrary multiprocessor topology. To alleviate the impediments of large requirements of CPU time for the optimal scheduling algorithm, we have developed three new effective techniques, namely,processor isomorphism, look ahead pruning, andlower bound on the completion time of tasks. The performance of our algorithm is analyzed using DFT and LU decomposition methods as benchmarks.  相似文献   

16.
芦奉良  刘羽  张军 《计算机工程》2011,37(11):77-79,82
针对共享存储多处理机系统中各处理机负载不均衡的问题,提出一种新的任务调度算法--多重波前法.在任务图划分的基础上,采用分层调度方式对原波前法进行改进,通过对任务序列进行多重遍历和重组以降低各处理器的分配误差,利用循环调度算法提高任务调度结果的精度,并给出该算法的并行实现.实验结果证明,该算法具有较低的任务分配误差和较高...  相似文献   

17.
Several schemes for detecting and locating faulty processors through self-diagnosis in multiprocessor systems have been discussed in the past. These schemes attempt to start multiple copies (versions) of the tasks on available idle processors simultaneously and compare the results generated by the copies to detect or locate faulty processors. These schemes are based on FCFS scheduling strategy. But, they cannot be applied directly to real-time multiprocessor systems where tasks have timing constraints. In this paper, we present a new scheduling algorithm that not only schedules real-time tasks, but also attempts to perform self-diagnosis if the system is not heavily loaded. We define load as a function of the tasks' laxities. We have carried out extensive simulations and compared the results of our algorithm with those of the myopic algorithm, a real-time task scheduler. Simulation results show that our algorithm that exploits both the tasks' laxity and spare capacity (unused processors) offers performance (guarantee ratio) comparable to that of the myopic algorithm in addition to achieving fault detection and location  相似文献   

18.
The superblock: An effective technique for VLIW and superscalar compilation   总被引:8,自引:1,他引:7  
A compiler for VLIW and superscalar processors must expose sufficient instruction-level parallelism (ILP) to effectively utilize the parallel hardware. However, ILP within basic blocks is extremely limited for control-intensive programs. We have developed a set of techniques for exploiting ILP across basic block boundaries. These techniques are based on a novel structure called thesuperblock. The superblock enables the optimizer and scheduler to extract more ILP along the important execution paths by systematically removing constraints due to the unimportant paths. Superblock optimization and scheduling have been implemented in the IMPACT-I compiler. This implementation gives us a unique opportunity to fully understand the issues involved in incorporating these techniques into a real compiler. Superblock optimizations and scheduling are shown to be useful while taking into account a variety of architectural features.  相似文献   

19.
The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem in large state space or continuous state space is decomposed into multiple smaller subproblems. Given a specific learning algorithm, each subproblem can be solved independently with limited available resources. In the end, component solutions can be recombined to obtain the desired result. To address the question of prioritizing subproblems in the scheduler, a weighted priority scheduling algorithm is proposed. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a new parallel method, called DCS-SPRL, is derived from combining DCS-SRL with a parallel scheduling architecture. In the DCS-SPRL method, the subproblems will be distributed among processors that have the capacity to work in parallel. The experimental results show that learning based on DCS-SPRL has fast convergence speed and good scalability.  相似文献   

20.
A parallel multithreaded program that is ostensibly deterministic may nevertheless behave nondeterministically due to bugs in the code. These bugs are called determinacy races, and they result when one thread updates a location in shared memory while another thread is concurrently accessing the location. We have implemented a provably efficient determinacy-race detector for Cilk, an algorithmic multithreaded programming language. If a Cilk program is run on a given input data set, our debugging tool, which we call the ``Nondeterminator,' either determines at least one location in the program that is subject to a determinacy race, or else it certifies that the program is race free when run on the data set. The core of the Nondeterminator is an asymptotically efficient serial algorithm (inspired by Tarjan's nearly linear-time least-common-ancestors algorithm) for detecting determinacy races in series-parallel directed acyclic graphs. For a Cilk program that runs in T time on one processor and uses v shared-memory locations, the Nondeterminator runs in O(T α(v,v)) time, where α is Tarjan's functional inverse of Ackermann's function, a very slowly growing function which, for all practical purposes, is bounded above by 4 . The Nondeterminator uses at most a constant factor more space than does the original program. On a variety of Cilk program benchmarks, the Nondeterminator exhibits a slowdown of less than 12 compared with the serial execution time of the original optimized code, which we contend is an acceptable slowdown for debugging purposes. Received November 11, 1997, and in final form September 21, 1998.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号