期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

林海波李文龙汤志忠《计算机研究与发展》2004,41(1):22-27

软件流水是开发循环程序指令级并行性的重要方法之一，IA-64是支持软件流水的EPIC体系结构，通过对NAS Benchmarks中可软件流水循环所需的寄存器进行量化分析，提出了一种限制循环展开因子的启发式算法，有效地解决了因可用寄存器不足而导致软件流水失败的问题，并提高了应用程序的执行速度。相似文献

2.

一种支持多重循环软件流水的寄存器结构 总被引：1，自引：0，他引：1

容红波汤志忠《软件学报》2000,11(3):401-409

寄存器结构及其分配是软件流水算法的关键之一.为支持多重循环的软件流水,该文提出一种新颖的寄存器结构：半共享跳跃式流水寄存器堆.它可以有效地解决多重循环软件流水下的特殊问题,即：同层次和跨层次的寄存器重命名问题以及断流问题;有效地消除外层循环的体间读写相关,提高程序的指令级并行度.它有3种分配方式可供灵活使用：单个寄存器、流水寄存器和寄存器组方式.流水寄存器方式对生存期确定的、局限于一个循环层次的寄存器重命名问题提供简单而有效的支持.寄存器组分配方式解决了多重循环软件流水时变量生存期不确定的情况.跳跃操作为相似文献

3.

基于依赖环问题的改进软流水框架

《计算机工程与应用》2017,(17):65-69

软件流水是编译后端优化中针对循环的调度技术,在软件流水优化过程中,依赖环是影响软件流水优化的重要因素。针对循环体中依赖环导致软件流水失败的问题,通过对循环中的依赖环进行分析处理,基于传统的模调度框架,提出了改进的软件流水优化算法,对于造成依赖环的寄存器引入多个分量,实现了对含有归约变量循环的流水。通过典型的算法测试,实验结果表明,该框架能够使得更多类型的循环流水成功,对于循环核心性能提升至少58%。相似文献

4.

摆动模调度中的寄存器溢出技术及其在GCC中的实现

杨旸顾国昌《小型微型计算机系统》2007,28(10):1822-1826

软件流水是一种通过发掘循环的不同迭代的不同部分的指令间并行性,使这些指令并行执行,从而提高循环的执行效率的优化技术.但该技术在提高指令并行性的同时也增加了寄存器压力,而寄存器溢出技术正是解决寄存器压力的有效方法.摆动模调度是一种在进行近似最优化调度的同时尽力减小寄存器压力的软件流水算法,该算法已经作为一个新的优化遍出现在GCC的最新版本中.本文以GCC为平台,论述了摆动模调度中的寄存器溢出技术及其工程实现,从而使摆动模调度算法进一步增强了对寄存器压力的处理能力. 相似文献

5.

软件流水的开销模型和决策框架 总被引：1，自引：0，他引：1

下载免费PDF全文

李文龙林海波汤志忠《软件学报》2004,15(7):1005-1011

软件流水是一种重要的指令调度技术,它通过重叠地执行不同的循环体来提高指令级并行性(instruction level parallelism,简称ILP).模调度是一类被广泛采用的软件流水调度算法.软件流水并非一种无损的优化方法,它具有一定的开销,比如延长了编译时间、增加了寄存器压力等.而且,受到体系结构、调度算法以及程序特性的限制,进行软件流水并不一定能达到理想的加速比,有时反而会引起性能下降.提出了一种面向程序特性的软件流水开销模型,对此模型下的软件流水开销进行了量化分析,并提出了一种基于相关性分析的相似文献

6.

GURPR—一种全局软件流水方法

苏伯珙丁士元《计算机学报》1989,12(9):663-673

软件流水技术是对程序及微程序中的循环进行优化的一种有效方法,可对基本块构成的循环体进行软件流水的LURPR算法已取得令人满意的效果。本文将在LURPR法的基础上,把软件流水技术扩展到任意结构的循环体,并给出相应的GURPR算法,GURPR算法可对任意的含非正常入口、条件出口、支路、循环嵌套及子程序调用的循环体进行软件流水。相似文献

7.

基于EPIC同时多线程处理器的寄存器堆设计

下载免费PDF全文

黄彩霞《计算机工程与科学》2009,31(10)

在体现EPIC设计思想的Itanium微处理器中,寄存器堆的管理是通过寄存器堆栈引擎(RSE)技术实现的。EPIC硬件简单,动态同时多线程(DSMT)易于开发线程级并行,针对结合二者优点的EDSMT微体系结构,我们提出一种基于映射表的寄存器堆管理方法—MTRSE。该方法兼容Itanium体系结构,支持同时多线程,并提高了寄存器资源使用效率。实验表明,当线程数为3或4时,该方法对于寄存器资源有40%使用效率的提升。相似文献

8.

URPR——一种实现软件流水技术的方法 总被引：2，自引：0，他引：2

苏伯珙丁士元《计算机学报》1988,(5)

软件流水技术是对AP数组处理机循环程序进行优化的一种有效方法.本文介绍一种在微代码循环压缩URCR算法基础上研究的URPR算法.首先对循环体进行展开,展开的个数取决于循环体之间的数据相关程度,然后将展开后的循环体逐个进行安放,最后进行收拢得到一个优化后的新循环体.初步实验验证了URPR比目前现有一些方法具有优越性. 相似文献

9.

软件流水中隐藏存储延迟的方法 总被引：3，自引：2，他引：3

刘利李文龙陈彧李胜梅汤志忠《软件学报》2005,16(10):1833-1841

软件流水是一种重要的指令调度技术,它通过同时执行来自不同循环体的指令来加快循环的执行速度.随着处理机运行速度的逐渐提高,存储访问延迟成为性能提高的瓶颈.为了减轻存储系统影响,软件流水结合了一些存储优化技术,通过隐藏存储延迟来提高性能.提出了一种延迟可预测的模调度算法(foresighted latencymodulo scheduling,简称FLMS),它根据循环的特点来确定load指令延迟.实验结果表明,FLMS算法减少了阻塞时间,提高了程序性能. 相似文献

10.

IA-64软件流水中旋转寄存器逆向分析技术

下载免费PDF全文

汪淼赵荣彩蔡国明丁志芳《计算机工程》2009,35(2):1-3

安腾（IA-64）提供的旋转寄存器机制使软件流水代码难于理解、调试和移植,在分析IA-64旋转寄存器机制的基础上,提出一种旋转寄存器逆向分析方法。该方法通过分析软件流水阶段计算旋转间距,由旋转间距识别出流水代码中的旋转相关寄存器。将该方法应用于静态二进制编译系统12A中,通过实验证明能够有效消除旋转寄存器对二进制翻译带来的影响。相似文献

11.

一种IA-64下的反软件流水算法

下载免费PDF全文

汪淼赵荣彩蔡国明《计算机工程与应用》2007,43(23):58-60

软件流水是一种循环程序的优化技术,它可以有效地提高指令级并行性。由于处理机的实现方法各不相同,在一种处理机上经过软件流水优化后的循环代码很难在其它处理机中移植和使用。反软件流水是软件流水的逆向操作,它可以消除循环代码中的软件流水特性,以便于代码在不同平台上的移植。基于IA-64体系结构,分析了软件流水的代码特点,提出了反流水算法,用于将ICC编译器编译后的可执行二进制代码消除软件流水特性,转换成语义等价的C代码。相似文献

12.

Quantitative Evaluation of Register Pressure on Software Pipelined Loops 总被引：3，自引：0，他引：3

Josep Llosa Eduard Ayguadé Mateo Valero 《International journal of parallel programming》1998,26(2):121-142

Software Pipelining is a loop scheduling technique that extracts loop parallelism by overlapping the execution of several consecutive iterations. One of the drawbacks of software pipelining is its high register requirements, which increase with the number of functional units and their degree of pipelining. This paper analyzes the register requirements of software pipelined loops. It also evaluates the effects on performance of the addition of spill code. Spill code is needed when the number of registers required by the software pipelined loop is larger than the number of registers of the target machine. This spill code increases memory traffic and can reduce performance. Finally, compilers can apply transformations in order to reduce the number of memory accesses and increase functional unit utilization. The paper also evaluates the negative effect on register requirements that some of these transformations might produce on loops. 相似文献

13.

Register constrained modulo scheduling

Zalamea J. Llosa J. Ayguade E. Valero M. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(5):417-430

Software pipelining is an instruction scheduling technique that exploits the instruction level parallelism (ILP) available in loops by overlapping operations from various successive loop iterations. The main drawback of aggressive software pipelining techniques is their high register requirements. If the requirements exceed the number of registers available in the target architecture, some steps need to be applied to reduce the register pressure (incurring some performance degradation): reduce iteration overlapping or spilling some lifetimes to memory. In the first part, we propose a set of heuristics to improve the spilling process and to better decide between adding spill code or directly decreasing the execution rate of iterations. The experimental evaluation, over a large number of representative loops and for a processor configuration, reports an increase in performance by a factor of 1.29 and a reduction of memory traffic by a factor of 1.36. In the second part, we analyze the use of backtracking and propose a novel approach for simultaneous instruction scheduling and register spilling in modulo scheduling: MIPS (modulo scheduling with integrated register spilling). The experimental evaluation reports an increase in performance by a factor of 1.46 and a reduction of the memory traffic by a factor of 1.66 (or an additional 1.13 and 1.22 with regard to the proposal in the first part). These improvements are achieved at the expense of a reasonable increase in the compilation time. 相似文献

14.

Time Optimal Software Pipelining of Loops with Control Flows

Han-Saem Yun Jihong Kim Soo-Mook Moon 《International journal of parallel programming》2003,31(5):339-391

Software pipelining is widely used as a compiler optimization technique to achieve high performance in machines that exploit instruction-level parallelism. However, surprisingly, there have been few theoretical or empirical results on time optimal software pipelining of loops with control flows. In this paper, we present three new theoretical and practical contributions for this underinvestigated problem. First, we propose a necessary and sufficient condition for a loop with control flows to have an optimally software-pipelined program. We also present a decision procedure to compute the condition. As part of the formal treatment of software pipelining, we propose a new formalization of software pipelining. Second, we present two software pipelining algorithms. The first algorithm computes an optimal solution for every loop satisfying the condition, but may run in exponential time. The second algorithm computes optimal solutions efficiently for most (but not all) loops satisfying the condition. The former one proves the sufficiency of the condition and the latter one suggests a practical optimal software pipelining algorithm. Third, we present experimental results which strongly indicate that achieving the time optimality in the software-pipelined programs is a viable goal in practice with reasonable hardware support. 相似文献

15.

Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures 总被引：1，自引：0，他引：1

Javier Zalamea Josep Llosa Eduard Ayguadé Mateo Valero 《International journal of parallel programming》2004,32(6):447-474

High-performance microprocessors are currently designed with the purpose of exploiting instruction level parallelism (ILP). The techniques used in their design and the aggressive scheduling techniques used to exploit this ILP tend to increase the register requirements of the loops. This paper reviews hardware and software techniques that alleviate the high register demands of aggressive scheduling heuristics on VLIW cores. From the software point of view, instruction scheduling can stretch lifetimes and reduce the register pressure. If more registers than those available in the architecture are required, some actions (such as the injection of spill code) have to be applied to reduce this pressure, at the expense of some performance degradation. From the hardware point of view, this degradation could be reduced if a high-capacity register file were included without causing a negative impact on the design of the processor (cycle time, area and power dissipation). Novel organizations for the register file based on clustering and hierarchical organization are necessary to meet the technology constraints. This paper proposes the used of a clustered organization and proposes an aggressive instruction scheduling technique that minimizes the negative effect of the limitations imposed by the register file organization. 相似文献

16.

Optimizing modulo scheduling to achieve reuse and concurrency for stream processors

Li Wang Jingling Xue Xuejun Yang 《The Journal of supercomputing》2012,59(3):1229-1251

Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level loops, either reuse or concurrency or both may be inadequately exploited. In this paper, we optimize modulo scheduling to maximize stream reuse and improve concurrency for stream-level loops. The key insight is that an unrolled and software-pipelined stream-level loop could be described by a set of reuse equations. Guided by reuse equations, a reuse-aware modulo scheduling algorithm is developed to simultaneously optimize the two performance objectives, reuse, and concurrency, for a loop in a unified framework. Moreover, we describe a code generation algorithm to automatically produce the optimized loop from a given loop. The experimental results obtained on FT64 and by simulation demonstrate the effectiveness of the proposed approach. 相似文献

17.

Software pipelining of loops by the method of modulo scheduling

N. I. V’yukova V. A. Galatenko S. V. Samborskii 《Programming and Computer Software》2007,33(6):307-315

Software pipelining is an efficient method of loop optimization that allows for parallelism of operations related to different loop iterations. Currently, most commercial compilers use loop pipelining methods based on modulo scheduling algorithms. This paper reviews these algorithms and considers algorithmic solutions designed for overcoming the negative effects of software pipelining of loops (a significant growth in code size and increase in the register pressure) as well as methods making it possible to use some hardware features of a target architecture. The paper considers global-scheduling mechanisms allowing one to pipeline loops containing a few basic blocks and loops in which the number of iterations is not known before the execution. 相似文献