期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Decomposed software pipelining: A new perspective and a new approach 总被引：1，自引：0，他引：1

Jian Wang Christine Eisenbeis Martin Jourdan Bogong Su 《International journal of parallel programming》1994,22(3):351-373

Software pipelining is an efficient instruction-level loop scheduling technique, but existing software pipelining approaches have not been widely used in practical and commercial compilers. This is mainly because resource constraints and the cyclic data dependencies make software pipelining very complicated and difficult to apply. In this paper we present a new perspective on software pipelining in which it is decomposed into two subproblems—one is free from cyclic data dependencies and can be effectively solved by the list scheduling technique, and the other is free from resource constraints and can be easily solved by classical polynomial-time algorithms of graph theory. Based on this new perspective, we develop a new instruction-level loop scheduling approach, call DEcomposed Software Pipelining (DESP). 相似文献

2.

Circuit retiming applied to decomposed software pipelining

Calland P.-Y. Darte A. Robert Y. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(1):24-35

This paper elaborates on a new view on software pipelining, called decomposed software pipelining. The approach is to decouple the problem into resource constraints and dependence constraints. Resource constraints management amounts to scheduling an acyclic graph subject to resource constraints for which an efficiency bound is known, resulting in a bound for loop scheduling. The acyclic graph is obtained by cutting some particular edges of the (cyclic) dependence graph. In this paper, we cut edges in a different way, using circuit retiming algorithms, so as to minimize both the longest dependence path in the acyclic graph, and the number of edges in the acyclic graph. With this technique, we improve the efficiency bound given for Gasperoni and Schwlegelshohn algorithm, and we reduce the constraints that remain for the acyclic problem. We believe this framework to be of interest because it brings a new insight into the software problem by establishing its deep link with the circuit retiming problem 相似文献

3.

资源约束的FPGA流水线调度 总被引：1，自引：0，他引：1

下载免费PDF全文

宋健葛颖增窦勇《计算机工程》2008,34(15):44-46

循环是程序中十分耗时的部分,流水线能够加速循环执行但需要大量运算资源。由于FPGA资源有限,将循环代码在FPGA上加速时手动设计流水线不具有实际可行性。该文使用软件流水将循环自动映射到FPGA上,并实现资源约束下的流水线调度。通过探索整个或者局部资源组合空间,可以选择一个性能和面积比较平衡的设计。相似文献

4.

循环体间相关问题及改进的URPR软件流水方法

苏伯珙王剑《计算机学报》1992,15(7):499-506

本文首先在理论上分析了循环体间相关对软件流水的影响.提出了一个由循环本身性质决定的充分必要条件并证明了满足此条件的循环是可限制的,否则是不可限制的;其次我们证明了任意不可限制的循环展开K次后即可转换为可限制循环,K取决于循环本身的性质;最后给出了循环预处理算法和一个新的循环体压缩算法.实验结果表明,这两个算法可使URPR算法对任意循环都能得到最优时间效益并保持了良好的空间效益及低的计算复杂性. 相似文献

5.

Converging to periodic schedules for cyclic scheduling problems with resources and deadlines

《Computers & Operations Research》2014

Cyclic scheduling has been widely studied because of the importance of applications in manufacturing systems and in computer science. For this class of problems, a finite set of tasks with precedence relations and resource constraints must be executed repetitively while maximizing the throughput. Many applications also require that execution schedules be periodic i.e. the execution of each task is repeated with a fixed global period w.The present paper develops a new method to build periodic schedules with cumulative resource constraints, periodic release dates and deadlines. The main idea is to fix the period w, to unwind the cyclic scheduling problem for some number of iterations, and to add precedence relations so that the minimum time lag between two successive executions of any task equals w. Then, using any usual (not cyclic) scheduling algorithm to compute task starting times for the unwound problem, we prove that either the method converges to a periodic schedule of period w or it fails to compute a schedule. A non-polynomial upper bound on the number of iterations to unwind in order to guarantee that cyclic precedence relations and resource constraints are fulfilled is also provided. This method is successfully applied to a real-life problem, namely the software pipelining of inner loops on an embedded VLIW processor core by using a Graham list scheduling algorithm. 相似文献

6.

Resource-constrained software pipelining

Aiken A. Nicolau A. Novack S. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(12):1248-1270

This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, generality in the software pipelining algorithm is not sacrificed to handle resource constraints, and scheduling choices are made with truly global information. Proofs of correctness and the results of experiments with an implementation are also presented 相似文献

7.

基于路径分组与数据相关松弛的软件流水

容红波汤志忠《软件学报》2001,12(4):544-555

软件流水是循环调度的重要方法.有分支循环的流水依然是个难题.现有算法可以分为4类:循环线性化、路径分离、整体调度和路径选择.它们都未能和谐地解决两个对立问题:转移时间最小化和最差约束问题.提出了基于路径分组和数据相关松弛的软件流水框架,试图无矛盾地解决上述问题.其主要思想是:(1)路径分组,即按照路径的执行概率和转移概率将路径分组,力求最小化转移时间;(2)数据相关松弛,力求避免最差约束,即当循环有多条路径时,有些相关在循环执行中并不一定有实例,理想的策略是仅当它有实例时才遵守.初步实验和定性分析表明,此相似文献

8.

基于依赖环问题的改进软流水框架

张仁高郑启龙王向前韩东科《计算机工程与应用》2017,53(17):65-69

软件流水是编译后端优化中针对循环的调度技术,在软件流水优化过程中,依赖环是影响软件流水优化的重要因素。针对循环体中依赖环导致软件流水失败的问题,通过对循环中的依赖环进行分析处理,基于传统的模调度框架,提出了改进的软件流水优化算法,对于造成依赖环的寄存器引入多个分量,实现了对含有归约变量循环的流水。通过典型的算法测试,实验结果表明,该框架能够使得更多类型的循环流水成功,对于循环核心性能提升至少58%。相似文献

9.

避免模调度中cache代价的优化方法 总被引：1，自引：0，他引：1

刘利李文龙郭振宇李胜梅汤志忠《软件学报》2005,16(10):1842-1852

软件流水能够加快循环的执行速度.模调度是一种被广泛采用的软件流水的启发式.为了改善存储系统,cache使用了分级机制,但这也带来了额外的存储延迟-cache代价.证明了模调度可能导致cache代价,并提出了一种可以避免模调度的cache代价的PCPMS(prevent cache penalty in modulo scheduling)算法.实验结果表明,PCPMS能够避免模调度中的cache代价,提高程序性能. 相似文献

10.

Novel Neighborhood Search for Multiprocessor Scheduling with Pipelining

《Journal of Parallel and Distributed Computing》2002,62(1):85-110

This paper presents a neighborhood search algorithm for heterogeneous multiprocessor scheduling in which loop pipelining is used to exploit parallelism between iterations. The method adopts a realistic model for interprocessor communication where resource contention is taken into consideration. The schedule representation scheme is flexible so that communication scheduling can be performed in a generic manner. Base on a general time formulation of the schedule performance, the algorithm improves an initial schedule in an efficient way by successive modification to the task processor mapping and task ordering. Simulation results show that significant improvement over existing methods can be obtained. A parallel software video encoder was implemented based on the scheduling result and real time performance was achieved with pipelining of frame encoding. 相似文献

11.

Trace Software Pipelining

下载免费PDF全文

Wang Jian Andreas Krall M.Anton Ertl 《计算机科学技术学报》1995,10(6):481-490

Global software pipelining is a complex but efficient compilation technique to exploit instruction-level parallelism for loops with branches.This paper presents a novel global software pipelining technique,called Trace Software Pipelining,targeted to the instruction-level parallel processors such as Very Long Instruction Word (VLIW) and superscalar machines.Trace software pipelining applies a global code scheduling technique to compact the original loop body.The resulting loop is called a trace software pipelined (TSP) code.The trace softwrae pipelined code can be directly executed with special architectural support or can be transformed into a globally software pipelined loop for the current VLIW and superscalar processors.Thus,exploiting parallelism across all iterations of a loop can be completed through compacting the original loop body with any global code scheduling technique.This makes our new technique very promising in practical compilers.Finally,we also present the preliminary experimental results to support our new approach. 相似文献

12.

消除VLIW结构上的循环体间冗余流相关 总被引：1，自引：1，他引：1

容红波汤志忠《软件学报》2000,11(1):126-132

数据相关是并行处理的基本依据.该文指出,VLIW(very long instruction word)特有的锁步性质使其数据相关性分析具有与众不同的特点.同一体差上的流相关形成一个线序集合,多体差上的特征流相关之间也存在包含关系.据此,提出一种用于VLIW的消除循环体间冗余流相关的方法.该方法是完备的,可以去除所有冗余的体间流相关,从而减轻循环调度的负担.文章给出判定单体差和多体差存在冗余的充分必要条件,以及消除冗余的线性复杂度的算法.这种方法具有普遍意义,可作为VLIW上软件流水和多指令流调度的基础. 相似文献

13.

移动微云任务迁移与调度

徐红霞孔志周《计算机应用与软件》2020,37(3):101-108,137

为了解决移动微云中时间期限约束下的任务能效调度问题,提出一种基于自适应概率的分布式任务调度算法。算法分为两个阶段:资源发现阶段和自适应概率调度阶段。第一阶段主要通过修正的QoS OLSR协议,使发送任务执行请求的源节点周期性地收集邻近处理节点的资源信息;第二阶段主要根据源节点的任务到达率,以概率计算方式选择最优的处理节点执行任务,在满足时间约束的同时,达到最优的能效。经过大量仿真场景的验证,结果表明该算法在维持较高的任务完成率的同时,还可以降低任务完成的平均能耗。相似文献

14.

模调度中的数据猜测方法

钟明郭振宇汤志忠《计算机应用与软件》2005,22(10):14-16

软件流水是一种重要的指令调度技术，通过重叠地执行不同的循环体来提高指令级并行性。模调度是一类重要的软件流水调度算法，保守的相关性分析可能会引入较多的模糊相关，这阻碍了模调度生成高效的调度结果。数据猜测能克服保守的相关性分析带来的调度限制，开发潜在的并行性。本文提出了模调度中的一种数据猜测方法，在开放源代码编译器ORC上实现了该方法，并用SPEC2000基准程序进行了测试，实验结果表明，该方法收到了较好的效果。相似文献

15.

Specification of software pipelining using petri nets 总被引：1，自引：0，他引：1

M. Rajagopalan V. H. Allan 《International journal of parallel programming》1994,22(3):273-301

This paper presents a flexible model for software pipelining using the petri nets. Our technique, called the Petri Net Pacemaker (PNP), can create near optimal pipelines with less algorithmic effort than other techniques. The pacemaker is a novel idea which exploits the cyclic behavior of petri nets to model the problem of scheduling operations of a loop body for software pipelining. A way of improving the performance of loops containing predicates is given. The PNP technique also shows how nested loops can be pipelined. A comparison with some of the other techniques is presented. THis work was partially supported by the National Science Foundation under grants CDA-9100788 and CDA-9200371. 相似文献

16.

基于收益的软件过程资源调度优化方法 总被引：1，自引：0，他引：1

颜海剑肖俊超《计算机应用研究》2008,25(11):3350-3353

软件项目管理人员须对软件过程中的各种资源进行优化调度,但依靠主观判断和个人经验的资源调度方法具有不稳定性和不可靠性,需要提供客观可靠的软件过程资源调度方法和工具。基于收益的资源调度优化方法通过对软件过程的资源调度进行建模,描述和定义投入资源产生的收益,分析软件过程中活动、资源和收益的各种约束关系,采用基于动态规划的优化算法以较高效率完成资源调度,使资源在软件过程中有效利用。相似文献

17.

A faster branch-and-bound algorithm for the earliness-tardiness scheduling problem

Francis Sourd Safia Kedad-Sidhoum 《Journal of Scheduling》2008,11(1):49-58

This paper addresses the one-machine scheduling problem with earliness-tardiness penalties. We propose a new branch-and-bound algorithm that can solve instances with up to 50 jobs and that can solve problems with even more general non-convex cost functions. The algorithm is based on the combination of a Lagrangean relaxation of resource constraints and new dominance rules. 相似文献

18.

Combining Loop Transformations Considering Caches and Scheduling

Michael E. Wolf Dror E. Maydan Ding-Kai Chen 《International journal of parallel programming》1998,26(4):479-503

The performance of modern microprocessors is greatly affected by cache behavior, instruction scheduling, register allocation and loop overhead. High-level loop transformations such as fission, fusion, tiling, interchanging and outer loop unrolling (e.g., unroll and jam) are well known to be capable of improving all these aspects of performance. Difficulties arise because these machine characteristics and these optimizations are highly interdependent. Interchanging two loops might, for example, improve cache behavior but make it impossible to allocate registers in the inner loop. Similarly, unrolling or interchanging a loop might individually hurt performance but doing both simultaneously might help performance. Little work has been published on how to combine these transformations into an efficient and effective compiler algorithm. In this paper, we present a model that estimates total machine cycle time taking into account cache misses, software pipelining, register pressure and loop overhead. We then develop an algorithm to intelligently search through the various, possible transformations, using our machine model to select the set of transformations leading to the best overall performance. We have implemented this algorithm as part of the MIPSPro commercial compiler system. We give experimental results showing that our approach is both effective and efficient in optimizing numerical programs. 相似文献

19.

复杂人力资源约束下的抢占式维修工序调度

孙笑宋卫星班利明齐小刚《控制与决策》2022,37(2):393-400

针对维修保障系统内部工序调度问题具有工序多、维修人员种类不同、维修人员等级不同等复杂特性,建立以维修工时最短和人力资源总负荷最小为目标函数的多目标多约束优化模型,设计了基于关键路径算法的优先权值编码对抢占式调度问题进行第一层编码,采用随机产生方案得出第二层人力资源编码,进而针对混合粒子群遗传算法设计符合抢占式调度的交叉... 相似文献

20.

采用两级软件流水技术的VLIW优化编译器

苏伯珙王剑《计算机学报》1992,15(7):491-498,506

本文首先提出一种能够充分开发循环程序指令级细粒度并行性的编译技术——两级软件流水,该技术基于URPR软件流水算法,把资源分配和代码优化有机地结合起来;然后叙述采用两级软件流水的VLIW优化编译器;最后给出一个FFT内层循环编译过程的实例及初步实验结果. 相似文献