首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 93 毫秒
1.
一种支持多重循环软件流水的寄存器结构   总被引:1,自引:0,他引:1  
容红波  汤志忠 《软件学报》2000,11(3):401-409
寄存器结构及其分配是软件流水算法的关键之一.为支持多重循环的软件流水,该文提出一种新颖的寄存器结构:半共享跳跃式流水寄存器堆.它可以有效地解决多重循环软件流水下的特殊问题,即:同层次和跨层次的寄存器重命名问题以及断流问题;有效地消除外层循环的体间读写相关,提高程序的指令级并行度.它有3种分配方式可供灵活使用:单个寄存器、流水寄存器和寄存器组方式.流水寄存器方式对生存期确定的、局限于一个循环层次的寄存器重命名问题提供简单而有效的支持.寄存器组分配方式解决了多重循环软件流水时变量生存期不确定的情况.跳跃操作为  相似文献   

2.
3种提高软件流水有效性的算法:比较和结合   总被引:1,自引:0,他引:1  
李文龙  陈彧  林海波  汤志忠 《软件学报》2005,16(10):1822-1832
软件流水是开发循环程序指令级并行性的技术,它通过并行执行连续的多个循环体来加快循环的执行速度.在软件流水中,循环体的重叠增加了寄存器需求,导致寄存器压力增大,当目标处理机所提供的寄存器不足时,软件流水可能失败.在Itanium处理机上评估了NAS和SPEC2000基准程序中的软件流水循环的寄存器需求,发现静态寄存器不足是造成软件流水失败的主要原因,提出了3种增加软件流水个数、提高软件流水有效性的算法:限制循环展开因子的算法(register sensitive unrolling,简称RSU)、堆栈寄存器分配算法(stacked registerallocation,简称SRA)以及变量类型转换的算法(variabletype conversion,简称VTC).RSU根据静态寄存器需求确定一个合理的展开因子,增加了软件流水的成功率;SRA和VTC分别使用空闲的堆栈寄存器和旋转寄存器来充当静态寄存器,提高了寄存器的利用率.在面向Itanium处理器的开放源码编译器ORC(open research compiler)上实现了这3种算法,通过NAS程序的测试比较了这3种算法的有效性,同时对它们的结合应用进行了研究和实验.  相似文献   

3.
一种基于寄存器压力的VLIW DSP分簇算法   总被引:1,自引:0,他引:1  
寄存器是程序运行时最宝贵的资源之一,软件流水在对VLIW DSP指令调度的同时,会显著增加寄存器的压力,从而导致寄存器溢出,软件流水中止。在以往的研究中,软件流水之前的指令分簇会更多地考虑指令并行性,往往会把寄存器的压力交给寄存器分配阶段,当物理寄存器不够分配时会造成寄存器溢出。通过考察指令运行时的寄存器压力情况对指令进行分簇,这样可根据各个簇的寄存器压力的动态信息减少寄存器的溢出,提高指令运行效率。  相似文献   

4.
安腾(IA-64)提供的旋转寄存器机制使软件流水代码难于理解、调试和移植,在分析IA-64旋转寄存器机制的基础上,提出一种旋转寄存器逆向分析方法。该方法通过分析软件流水阶段计算旋转间距,由旋转间距识别出流水代码中的旋转相关寄存器。将该方法应用于静态二进制编译系统12A中,通过实验证明能够有效消除旋转寄存器对二进制翻译带来的影响。  相似文献   

5.
软件流水是一种通过发掘循环的不同迭代的不同部分的指令间并行性,使这些指令并行执行,从而提高循环的执行效率的优化技术.但该技术在提高指令并行性的同时也增加了寄存器压力,而寄存器溢出技术正是解决寄存器压力的有效方法.摆动模调度是一种在进行近似最优化调度的同时尽力减小寄存器压力的软件流水算法,该算法已经作为一个新的优化遍出现在GCC的最新版本中.本文以GCC为平台,论述了摆动模调度中的寄存器溢出技术及其工程实现,从而使摆动模调度算法进一步增强了对寄存器压力的处理能力.  相似文献   

6.
软件流水是编译后端优化中针对循环的调度技术,在软件流水优化过程中,依赖环是影响软件流水优化的重要因素。针对循环体中依赖环导致软件流水失败的问题,通过对循环中的依赖环进行分析处理,基于传统的模调度框架,提出了改进的软件流水优化算法,对于造成依赖环的寄存器引入多个分量,实现了对含有归约变量循环的流水。通过典型的算法测试,实验结果表明,该框架能够使得更多类型的循环流水成功,对于循环核心性能提升至少58%。  相似文献   

7.
软件流水的开销模型和决策框架   总被引:1,自引:0,他引:1       下载免费PDF全文
李文龙  林海波  汤志忠 《软件学报》2004,15(7):1005-1011
软件流水是一种重要的指令调度技术,它通过重叠地执行不同的循环体来提高指令级并行性(instruction level parallelism,简称ILP).模调度是一类被广泛采用的软件流水调度算法.软件流水并非一种无损的优化方法,它具有一定的开销,比如延长了编译时间、增加了寄存器压力等.而且,受到体系结构、调度算法以及程序特性的限制,进行软件流水并不一定能达到理想的加速比,有时反而会引起性能下降.提出了一种面向程序特性的软件流水开销模型,对此模型下的软件流水开销进行了量化分析,并提出了一种基于相关性分析的  相似文献   

8.
由于缺乏相关硬件功能,Open64编译器的软件流水技术没有面向X86处理器的版本。为此,提出一种适用于X86平台的Open64软件流水实现框架。利用软件实现处理器的部分硬件行为,通过循环过滤方法剔除不适用的循环。针对缺乏循环寄存器文件的问题,设计寄存器分配算法达到使用通用寄存器的目的,并添加模变量扩展模块以保证执行的正确性。实验结果表明,与循环展开方案相比,该框架可使系统平均获得9%的性能提升。  相似文献   

9.
本文叙述一个正在开发的VLIW多处理单元单片机,这个机器的体系结构基于URPR软件流水技术,采用了流水寄存器堆来减少体间相关距离,因此,细粒度并行性可得到充分开发,从而提高了循环体重叠程度,使得优化后的循环体的长度可大大缩短.模拟实验结果表明,这个体系结构在优化编译器的配合下可达到很高的性能。  相似文献   

10.
在DM642上对AVS变长解码部分进行了优化。针对指数哥伦布码的特性调整了存储器访问的方式,并通过对循环结构的调整及寄存器资源的合理分配使编译器能够进行高效的软件流水编排。经过优化后代码执行时间降低了70%以上,达到了标清尺寸实时解码要求。  相似文献   

11.
Global software pipelining is a complex but efficient compilation technique to exploit instruction-level parallelism for loops with branches.This paper presents a novel global software pipelining technique,called Trace Software Pipelining,targeted to the instruction-level parallel processors such as Very Long Instruction Word (VLIW) and superscalar machines.Trace software pipelining applies a global code scheduling technique to compact the original loop body.The resulting loop is called a trace software pipelined (TSP) code.The trace softwrae pipelined code can be directly executed with special architectural support or can be transformed into a globally software pipelined loop for the current VLIW and superscalar processors.Thus,exploiting parallelism across all iterations of a loop can be completed through compacting the original loop body with any global code scheduling technique.This makes our new technique very promising in practical compilers.Finally,we also present the preliminary experimental results to support our new approach.  相似文献   

12.
Quantitative Evaluation of Register Pressure on Software Pipelined Loops   总被引:3,自引:0,他引:3  
Software Pipelining is a loop scheduling technique that extracts loop parallelism by overlapping the execution of several consecutive iterations. One of the drawbacks of software pipelining is its high register requirements, which increase with the number of functional units and their degree of pipelining. This paper analyzes the register requirements of software pipelined loops. It also evaluates the effects on performance of the addition of spill code. Spill code is needed when the number of registers required by the software pipelined loop is larger than the number of registers of the target machine. This spill code increases memory traffic and can reduce performance. Finally, compilers can apply transformations in order to reduce the number of memory accesses and increase functional unit utilization. The paper also evaluates the negative effect on register requirements that some of these transformations might produce on loops.  相似文献   

13.
Software pipelining is widely used as a compiler optimization technique to achieve high performance in machines that exploit instruction-level parallelism. However, surprisingly, there have been few theoretical or empirical results on time optimal software pipelining of loops with control flows. In this paper, we present three new theoretical and practical contributions for this underinvestigated problem. First, we propose a necessary and sufficient condition for a loop with control flows to have an optimally software-pipelined program. We also present a decision procedure to compute the condition. As part of the formal treatment of software pipelining, we propose a new formalization of software pipelining. Second, we present two software pipelining algorithms. The first algorithm computes an optimal solution for every loop satisfying the condition, but may run in exponential time. The second algorithm computes optimal solutions efficiently for most (but not all) loops satisfying the condition. The former one proves the sufficiency of the condition and the latter one suggests a practical optimal software pipelining algorithm. Third, we present experimental results which strongly indicate that achieving the time optimality in the software-pipelined programs is a viable goal in practice with reasonable hardware support.  相似文献   

14.
Specification of software pipelining using petri nets   总被引:1,自引:0,他引:1  
This paper presents a flexible model for software pipelining using the petri nets. Our technique, called the Petri Net Pacemaker (PNP), can create near optimal pipelines with less algorithmic effort than other techniques. The pacemaker is a novel idea which exploits the cyclic behavior of petri nets to model the problem of scheduling operations of a loop body for software pipelining. A way of improving the performance of loops containing predicates is given. The PNP technique also shows how nested loops can be pipelined. A comparison with some of the other techniques is presented. THis work was partially supported by the National Science Foundation under grants CDA-9100788 and CDA-9200371.  相似文献   

15.
This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, generality in the software pipelining algorithm is not sacrificed to handle resource constraints, and scheduling choices are made with truly global information. Proofs of correctness and the results of experiments with an implementation are also presented  相似文献   

16.
The basic concept of pipelined data-parallel algorithms is introduced by contrasting the algorithms with other styles of computation and by a simple example (a pipeline image distance transformation algorithm). Pipelined data-parallel algorithms are a class of algorithms which use pipelined operations and data level partitioning to achieve parallelism. Applications which involve data parallelism and recurrence relations are good candidates for this kind of algorithm. The computations are ideal for distributed-memory multicomputers. By controlling the granularity through data partitioning and overlapping the operations through pipelining, it is possible to achieve a balanced computation on multicomputers. An analytic model is presented for modeling pipelined data-parallel computation on multicomputers. The model uses timed Petri nets to describe data pipelining operations. As a case study, the model is applied to a pipelined matrix multiplication algorithm. Predicted results match closely with the measured performance on a 64-node NCUBE hypercube multicomputer  相似文献   

17.
Software pipelining increases the loop execution throughput by overlapping the execution of successive iterations in a pipelined fashion. For loops with control flows, however, software pipelining is not straightforward because we need to consider the overlap of more than one execution path. Modulo scheduling simply transforms them into straightline loops through if-conversion which, in effect, achieves a fixed, worst-case initiation interval (/spl par/) among all paths. On the other hand, all-path pipelining (APP) and enhanced pipeline scheduling (EPS) can achieve a variable /spl par/ depending on the path that is taken at execution time. Unfortunately, APP concentrates only on the overlap within the same path, entirely losing the overlap between different paths, whereas EPS attempts to overlap all paths together, failing to produce a tight schedule for each individual path, especially when resource constraints are tight. In this paper, we propose a new approach to EPS called split-path EPS (SP-EPS), which first splits each individual path via tail duplication and then performs EPS in a way to guarantee a tight schedule for each path, while producing a competitive cross-path schedule. We also extend SP-EPS to outer loops such that frequent paths that bypass the inner loop are split and then scheduled by SP-EPS. Our experimental results on nontrivial integer benchmarks show that SP-EPS can achieve as much as a geometric mean of 10 percent speedup over EPS when innermost loops are scheduled by SP-EPS, while it can achieve a geometric mean of 11.9 percent speedup when outer loops are also scheduled by SP-EPS.  相似文献   

18.
Software pipelining methods based on an ILP (integer linear programming) framework have been successfully applied to derive rate-optimal schedules under resource constraints. However, like many other previous works on software pipelining, ILP-based work has focused on resource constraints of simple function units, e.g., “clean pipelines”—pipelines without structural hazards. The problem for architectures beyond such clean pipelines remains open. One challenge is how to represent such resource constraints for unclean pipelines, i.e., pipelined function units, but having structural hazards.In this paper, we propose a method to constructrate-optimalsoftware pipelined schedules for pipelined architectures with structural hazards. A distinct feature of this work is that it provides a unified ILP framework for two challenging and interrelated aspects of software pipelining—the scheduling of instructions at particular times and the mapping of those instructions to specific function units. Solving both of these aspects is essential to finding schedules which will work both on VLIW machines which map instructions to fixed function units and on dynamic out-of-order superscalars. We propose two ILP formulations to solve the integrated scheduling and mapping problem. Both adopt principles of graph coloring in an ILP framework, and one usesforbidden latenciesin an elegant extension of classical hardware pipeline control theory.We have run experiments on four variations of our proposed formulations. As input we used a set of 415 “unique” loops taken from several benchmark suites, and we targeted an architecture whose function units contain many structural hazards. All four of our variations did well, with the best finding a rate-optimal schedule for 65% of the loops. This compares favorably with a leading heuristic, Huff'sSlack Scheduling—the ILP approaches found a schedule with smaller initiation interval for over 50% of the loops, with a mean improvement of almost 30%. Finally, we have found that reusing pipeline stages—and thus adding hazards—results in only a 10% drop in performance, while permitting significant savings in area.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号