期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using processor-cache affinity information in shared-memorymultiprocessor scheduling

Squillante M.S. Lazowska E.D. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(2):131-143

In a shared-memory multiprocessor system, it may be more efficient to schedule a task on one processor than on another if relevant data already reside in a particular processor's cache. The effects of this type of processor affinity are examined. It is observed that tasks continuously alternate between executing at a processor and releasing this processor due to I/O, synchronization, quantum expiration, or preemption. Queuing network models of different abstract scheduling policies are formulated, spanning the range from ignoring affinity to fixing tasks on processors. These models are solved via mean value analysis, where possible, and by simulation otherwise. An analytic cache model is developed and used in these scheduling models to include the effects of an initial burst of cache misses experienced by tasks when they return to a processor for execution. A mean-value technique is also developed and used in the scheduling models to include the effects of increased bus traffic due to these bursts of cache misses. Only a small amount of affinity information needs to be maintained for each task. The importance of having a policy that adapts its behavior to changes in system load is demonstrated 相似文献

2.

Shared processor scheduling

Dariusz Dereniowski Wiesław Kubiak 《Journal of Scheduling》2018,21(6):583-593

We study the shared processor scheduling problem with a single shared processor to maximize total weighted overlap, where an overlap for a job is the amount of time it is processed on its private and shared processor in parallel. A polynomial-time optimization algorithm has been given for the problem with equal weights in the literature. This paper extends that result by showing an \(O(n \log n)\)-time optimization algorithm for a class of instances in which non-decreasing order of jobs with respect to processing times provides a non-increasing order with respect to weights—this instance generalizes the unweighted case of the problem. This algorithm also leads to a \(\frac{1}{2}\)-approximation algorithm for the general weighted problem. The complexity of the weighted problem remains open. 相似文献

3.

网络处理器任务调度 总被引：1，自引：0，他引：1

吴佳骏张兆庆《计算机应用研究》2007,24(8):34-37

按照网络应用及IXP处理器架构特点建立代价模型,并且以网络应用需求为导向提出了基于时间和吞吐量的任务调度算法(LTTS).该算法兼顾网络程序的吞吐量和延迟需求自动完成网络任务的调度工作,并且在这两项评价网络程序性能的重要指标上得到了满意的结果. 相似文献

4.

基于时间帧的处理器PFair调度改进算法

蒋丽丽陈国彬张广泉. 《计算机应用研究》2016,33(4)

为了解决PFair算法进行交互任务调度时,由于忽略了不同阶段的周期性任务而导致多个线程之间任务的迁移问题以及空间和时间的浪费问题,提出了基于时间帧的处理器PFair调度改进算法。该算法基于周期性任务系统的特点,引入时间帧控制和改变本地周期性任务调度来限制任务迁移,从而实现对PFair算法的改进。为了评估算法的迁移开销和公平性,通过实验对普通PFair算法及本文所提出的改进算法ERfair进行对比实验,结果表明,改进算法ERfair能够通过时间帧内调度和分区控制大大降低任务在不同处理器间的迁移次数。基于时间帧的处理器PFair调度改进算法在保证公平性的同时,提高了系统的效率,应用于多核处理器上的任务调度是可行的、有效的。相似文献

5.

Two processor scheduling with limited preemption

Errol L. Lloyd 《Performance Evaluation》1986,6(4):307-312

We introduce deterministic task system scheduling with limited preemption—that is, a preempted task can resume execution only on the processor where it originally executed. Two classes of limited preemptive schedules are introduced, corresponding to whether or not unforced idle time is allowed in the schedule. Our main result is to show that, on two processors, these two classes are equivalent with respect to optimal schedule lengths. We use this result to establish a hierarchy of optimal schedule lengths. 相似文献

6.

Using Timed Petri Net to Model Instruction-Level Loop Scheduling with Resource Constraints

下载免费PDF全文

Wang Jian Christine Eisenbeis Su Bogong 《计算机科学技术学报》1994,9(2):128-143

This paper uses timed petri net to model and analyze the problem of instructionlevel loop scheduling with resource constraints,which has been proven to be an NP complete problem.First,we present a new timed Petri net model to integrate functional unit allocation,register allocation and spilling into a unified theoretical framework.Then we develop a state subgraph,called Register Allocation Solution Graph,which can effectively describe the major behavior of our new model.the main property of this state subgraph is that the number of all its nodes is polynomial.Finally we present and prove that the optimum loop schedules can be found with polynomial computation complexity,for almost all practical loop programs.Our work lightens a new idea of finding the optimum loop schedules. 相似文献

7.

基于粒子群算法的多核处理器线程调度研究

杨洪斌陈伟吴悦《计算机工程与设计》2010,31(5)

为有效解决多核处理器的线程调度问题,提出了一种基于粒子群算法框架上的线程调度算法.该算法依据设计的调度模型,在线程DAG图上通过复制不在同一处理器上且存在相关性的线程,生成相互独立的子DAG图,并采用改进的粒子群优化算法对其进行合理调度,由此提高线程调度效率.仿真实现了该算法,并通过实验数据验证了该算法的优越性. 相似文献

8.

Modelling heterogeneous processor scheduling for real-time systems

《Robotics and Computer》1994,11(2):91-98

A new model is presented to describe data-flow algorithms implemented in a multiprocessing system. Called the resource/data flow graph (RDFG), the model explicitly represents cyclo-static processor schedules as circuits of processor arcs that reflect the order that processors execute graph nodes. The model also allows the guarantee of meeting hard real-time deadlines. When unfolded, the model identifies statistically the processor schedule. The model therefore is useful for determining the throughput and latency of systems with heterogeneous processors. The applicability of the model is demonstrated using a space surveillance algorithm. 相似文献

9.

异构多核处理器的任务调度算法 总被引：1，自引：0，他引：1

下载免费PDF全文

蒋建春汪同庆《计算机工程与应用》2009,45(33):52-56

在研究Min-min、Max-min算法和Sufferage算法基础上,针对异构多核处理器的特点,提出一种任务静态调度算法——自适应分段Sufferage算法（Adaptive Segmented Sufferage,ASS）。该算法以最早完成时间和负载均衡为目标进行任务分配,先将任务分配分成两个阶段：在第一个阶段以最少完成时间作为分配原则进行分配,选择单位时间内节省时间最多的任务先分配;在第二个阶段以负载均衡为分配原则进行分配,选择执行时间大的任务先分配。然后选取不同调节参数,对任务进行多次重新分配,以最小的最大完成时间为最后分配结果,实现自适应调节。通过实验验证,该算法在实现最少完成时间的前提下能很好地达到负载均衡。相似文献

10.

Parallel loop generation and scheduling

Shahriar Lotfi Saeed Parsa 《The Journal of supercomputing》2009,50(3):289-306

Loop tiling is an efficient loop transformation, mainly applied to detect coarse-grained parallelism in loops. It is a difficult task to apply n-dimensional non-rectangular tiles to generate parallel loops. This paper offers an efficient scheme to apply non-rectangular n-dimensional tiles in non-rectangular iteration spaces, to generate parallel loops. In order to exploit wavefront parallelism efficiently, all the tiles with equal sum of coordinates are assumed to reside on the same wavefront. Also, in order to assign parallelepiped tiles on each wavefront to different processors, an improved block scheduling strategy is offered in this paper. 相似文献

11.

A note on optimal preemptive scheduling for two processor systems

Andrzej Rowicki 《Information Processing Letters》1977,6(1):25-28

相似文献

12.

Job scheduling and processor allocation for grid computing on metacomputers

《Journal of Parallel and Distributed Computing》2005,65(11):1406-1418

Scheduling is a fundamental issue in achieving high performance on metacomputers and computational grids. For the first time, the job scheduling problem for grid computing on metacomputers is studied as a combinatorial optimization problem. A cost model is proposed for modeling communication heterogeneity on computational grids. A processor allocation algorithm is developed which always finds an optimal processor allocation that minimizes the effective execution time of a job when the job is being scheduled. It is proven that the list scheduling (LS) algorithm can achieve reasonable worst-case performance bound in grid environments supporting distributed supercomputing with large applications. We compare the performance of various job scheduling and processor allocation algorithms for grid computing on metacomputers. We evaluate the performance of 128 combinations of two job scheduling algorithms, four initial job ordering strategies, four processor allocation algorithms, and four metacomputers by extensive simulation. It is found that the combination of largest job first (LJF) initial job ordering and minimum effective execution time (MEET) or largest machine first (LMF) processor allocation algorithm yields the best average-case performance, and the choice of FCFS and LS depends on the range of job sizes. It is also observed that communication heterogeneity does have significant impact on schedule lengths. 相似文献

13.

Exploiting wavefront parallelism on large-scale shared-memorymultiprocessors

Manjikian N. Abdelrahman T.S. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(3):259-271

Wavefront parallelism, in which parallelism is limited to hyperplanes in an iteration space, can arise when compilers apply tiling to loop nests to enhance locality. Previous approaches for scheduling wavefront parallelism focused on maximizing parallelism; balancing workloads, and reducing synchronization. In this paper, we show that on large-scale shared-memory multiprocessors, locality is a crucial factor. We make the distinction between intratile and intertile locality and show that as the number of processors grows, intertile locality becomes more important. We consider and experimentally evaluate existing strategies for scheduling wavefront parallelism. We show that dynamic self-scheduling can be efficiently used on a small number of processors, but performs poorly at large scale because it does not enhance intertile locality. By contrast, static scheduling strategies enhance intertile locality for small tiles, maintaining parallelism and resulting in better performance at large scale. Results from a Convex SPP1000 multiprocessor demonstrate the importance of taking intertile locality into account. Static scheduling outperforms dynamic self-scheduling by a factor of up to 2.3 on 30 processors 相似文献

14.

一种面向流媒体的多核处理器分组调度算法*

贺鹏程王劲林邓浩江张武《计算机应用研究》2010,27(10):3911-3914

针对流媒体分组处理和多核网络处理器cache亲和性的特点,提出了综合流调度和分组调度优点的两级调度算法,即FBLA。FCFS调度算法可以达到分组级的细粒度负载均衡,但cache亲和性却很差。基于hash的调度算法可以保证很好的cache亲和性,但难以保证核间负载均衡。FBLA算法对这两种算法进行了折中,既通过cache亲和性提高处理器利用率,又能够达到细粒度的核间负载均衡。理论分析和仿真评估表明,FBLA算法具有良好的cache亲和性和负载均衡性,转发延迟和延迟波动比FCFS算法更低。在亲和因子较小时,F 相似文献

15.

Two-level utilization-based processor allocation for scheduling moldable jobs

Wu Ying-Jhih Yu Shuo-Ting Lai Kuan-Chou Chhabra Amit Chang Hsi-Ya Huang Kuo-Chan 《The Journal of supercomputing》2020,76(12):10212-10239

The Journal of Supercomputing - Most modern parallel programs are written with the moldable property. However, most existing parallel computing systems treat such parallel programs as rigid jobs... 相似文献

16.

A performance-based parallel loop scheduling on grid environments

Wen-Chung Shih Chao-Tung Yang Shian-Shyong Tseng 《The Journal of supercomputing》2007,41(3):247-267

The effectiveness of loop self-scheduling schemes has been shown on traditional multiprocessors in the past and computing clusters in the recent years. However, parallel loop scheduling has not been widely applied to computing grids, which are characterized by heterogeneous resources and dynamic environments. In this paper, a performance-based approach, taking the two characteristics above into consideration, is proposed to schedule parallel loop iterations on grid environments. Furthermore, we use a parameter, SWR, to estimate the proportion of the workload which can be scheduled statically, thus alleviating the effect of irregular workloads. Experimental results on a grid testbed show that the proposed approach can reduce the completion time for applications with regular or irregular workloads. Consequently, we claim that parallel loop scheduling can benefit applications on grid environments. 相似文献

17.

A compile-time scheduling heuristic for interconnection-constrainedheterogeneous processor architectures

Sih G.C. Lee E.A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(2):175-187

The authors present a compile-time scheduling heuristic called dynamic level scheduling, which accounts for interprocessor communication overhead when mapping precedence-constrained, communicating tasks onto heterogeneous processor architectures with limited or possibly irregular interconnection structures. This technique uses dynamically-changing priorities to match tasks with processors at each step, and schedules over both spatial and temporal dimensions to eliminate shared resource contention. This method is fast, flexible, widely targetable, and displays promising performance 相似文献

18.

On runtime parallel scheduling for processor load balancing 总被引：3，自引：0，他引：3

Min-You Wu 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(2):173-186

Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compile-time or runtime. It provides high-quality load balancing. This paper presents an overview of the parallel scheduling technique. Scheduling algorithms for tree, hypercube, and mesh networks are presented. These algorithms can fully balance the load and maximize locality at runtime. Communication costs are significantly reduced compared to other existing algorithms 相似文献

19.

VLIW处理器循环指令缓冲器设计与实现

李勇胡慧俐杨焕荣《计算机应用》2014,34(4):1005-1009

数字信号处理软件中循环程序在执行时间上占有很大比例,用指令缓冲器暂存循环代码可以减少程序存储器的访问次数,提高处理器性能。在VLIW处理器指令流水线中增加一个支持循环指令的缓冲器,该缓冲器能够缓存循环程序指令,并以软件流水的形式向功能部件派发循环程序指令。这样循环程序代码只需访存一次而执行多次,大大减少了访存次数。在循环指令运行期间,缓冲器发出信号使程序存储器进入睡眠状态可以降低处理器功耗。典型的应用程序测试表明,使用了循环缓冲后,取指流水线空闲率可达90%以上,处理器整体性能提高10%左右,而循环缓冲的硬件面积开销大约占取指流水线的9%。相似文献

20.

一种面向VLIW芯片的线性指令调度算法

甘玲汤睿《微计算机信息》2009,25(2)

VLIW是DSP芯片上使用最多的一种技术,要发挥DSP芯片的性能优势,需要编译器的支持.目前关于VLLW技术的研究主要集中在如何形成更长的基本块,以及基本块之间的代码优化算法上,对于如何选择指令从而形成一个超长指令字的算法,却没有仔细地描述和实现,但这是在编译器的指令调度模块中需要具体考虑的问题,具有工程实践意义.本文通过改进编译器的lisf算法实现了支持VLIW技术的指令调度优化算法,改进的算法可以充分利用芯片的VLIW结构的优势,加速程序运行,具有较好性能. 相似文献