首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 828 毫秒
1.
一种软件流水的反流水算法   总被引:1,自引:0,他引:1       下载免费PDF全文
软件流水是一种循环程序的优化技术,已经广泛应用于现代优化编译器中.为了充分利用VLIW DSP处理机的指令级并行性,必须使用软件流水技术对DSP程序进行优化.然而,在串行源代码不存在的情况下,对软件流水后的原始代码进行变换、理解、测试和调试,并转换成其他处理机的代码是非常困难的.提出了一种反流水技术,它能够将软件流水后的优化汇编代码反向转换成语义等价的相应代码.通过20个程序的初步实验,验证了所提出的反流水算法的正确性.  相似文献   

2.
IA-64体系结构使用软件流水提高程序的执行性能,但产生的二进制代码跟机器特性紧密相关,给代码跨平台移植造成了困难。该文针对IA-64体系结构下软件流水的特点,提出2种软件流水代码消除方法,它能够将软件流水代码转换成语义等价无硬件依赖的串行代码,实验验证了这2种方法的有效性。  相似文献   

3.
软件流水是一种开发循环程序指令级并行性的技术, 它通过并行执行连续的多个迭代来加快循环的执行速度。而在逆向工程中,软件流水却为逆向翻译带来了困难。为此,基于IA-64平台,提出了一种反流水算法,针对循环中包含软件流水的汇编代码进行处理,将其反向转换成语义等价的串行代码,并通过实验验证了该算法的有效性,为在二进制翻译中处理软件流水代码奠定了基础。  相似文献   

4.
循环是程序中的热代码,而软件流水是一种细粒度的循环优化方法,它通过将循环中不同迭代之间的操作并行执行,最大程度地开发指令级并行。模调度是一种效果很好的软件流水算法。论文以gcc3.3为基础,提出了模调度与DFA结合的软件流水方法,及其工程实现,实验数据表明,优化效果明显。  相似文献   

5.
循环是程序中的热代码,对循环进行有效的优化可以显著缩短程序的执行时间。软件流水是一种开发循环体指令级并行的细粒度循环优化技术,它通过调度循环中连续迭代之间的指令使其并行执行,从而提高了循环的执行效率。实验数据表明,用Cerngoop程序包进行测试,循环优化效果明显。  相似文献   

6.
投机优化技术作为一种先进的现代编译技术,有效地提高了指令执行的并行性。然而,在逆向工程中,有时要实现代码的跨平台移植,而投机优化技术又受硬件平台的制约;有时需要优化代码的结构,使程序的逻辑结构易于理解;这些都要求消除这种与硬件息息相关的优化技术。论文基于IA-64平台,提出了一种反投机处理算法,对ICC编译器编译后的可执行二进制代码进行处理,消除代码中的投机优化,将其转换成等价的没有投机优化的指令序列,这样使反投机后的代码更容易理解,而且在逆向工程中摆脱了硬件的限制。测试表明该反投机技术可以对ICC编译后的代码进行有效处理。  相似文献   

7.
安腾(IA-64)提供的旋转寄存器机制使软件流水代码难于理解、调试和移植,在分析IA-64旋转寄存器机制的基础上,提出一种旋转寄存器逆向分析方法。该方法通过分析软件流水阶段计算旋转间距,由旋转间距识别出流水代码中的旋转相关寄存器。将该方法应用于静态二进制编译系统12A中,通过实验证明能够有效消除旋转寄存器对二进制翻译带来的影响。  相似文献   

8.
具有条件分支的循环通过IF转换将显式的控制流转换为隐式的控制流,从而为指令调度提供进一步的机会.但它往往将程序的代码进行深度重构,增加了程序的理解和代码重建工作的复杂性.提出了一种软件流水循环中的隐式控制流恢复技术,用于重构软件流水循环中的条件分支,提高软件逆向工程中生成的目标代码的质量.  相似文献   

9.
一种支持多重循环软件流水的寄存器结构   总被引:1,自引:0,他引:1  
容红波  汤志忠 《软件学报》2000,11(3):401-409
寄存器结构及其分配是软件流水算法的关键之一.为支持多重循环的软件流水,该文提出一种新颖的寄存器结构:半共享跳跃式流水寄存器堆.它可以有效地解决多重循环软件流水下的特殊问题,即:同层次和跨层次的寄存器重命名问题以及断流问题;有效地消除外层循环的体间读写相关,提高程序的指令级并行度.它有3种分配方式可供灵活使用:单个寄存器、流水寄存器和寄存器组方式.流水寄存器方式对生存期确定的、局限于一个循环层次的寄存器重命名问题提供简单而有效的支持.寄存器组分配方式解决了多重循环软件流水时变量生存期不确定的情况.跳跃操作为  相似文献   

10.
软件流水中隐藏存储延迟的方法   总被引:5,自引:2,他引:3  
刘利  李文龙  陈彧  李胜梅  汤志忠 《软件学报》2005,16(10):1833-1841
软件流水是一种重要的指令调度技术,它通过同时执行来自不同循环体的指令来加快循环的执行速度.随着处理机运行速度的逐渐提高,存储访问延迟成为性能提高的瓶颈.为了减轻存储系统影响,软件流水结合了一些存储优化技术,通过隐藏存储延迟来提高性能.提出了一种延迟可预测的模调度算法(foresighted latencymodulo scheduling,简称FLMS),它根据循环的特点来确定load指令延迟.实验结果表明,FLMS算法减少了阻塞时间,提高了程序性能.  相似文献   

11.
Software pipelining is widely used as a compiler optimization technique to achieve high performance in machines that exploit instruction-level parallelism. However, surprisingly, there have been few theoretical or empirical results on time optimal software pipelining of loops with control flows. In this paper, we present three new theoretical and practical contributions for this underinvestigated problem. First, we propose a necessary and sufficient condition for a loop with control flows to have an optimally software-pipelined program. We also present a decision procedure to compute the condition. As part of the formal treatment of software pipelining, we propose a new formalization of software pipelining. Second, we present two software pipelining algorithms. The first algorithm computes an optimal solution for every loop satisfying the condition, but may run in exponential time. The second algorithm computes optimal solutions efficiently for most (but not all) loops satisfying the condition. The former one proves the sufficiency of the condition and the latter one suggests a practical optimal software pipelining algorithm. Third, we present experimental results which strongly indicate that achieving the time optimality in the software-pipelined programs is a viable goal in practice with reasonable hardware support.  相似文献   

12.
3种提高软件流水有效性的算法:比较和结合   总被引:1,自引:0,他引:1  
李文龙  陈彧  林海波  汤志忠 《软件学报》2005,16(10):1822-1832
软件流水是开发循环程序指令级并行性的技术,它通过并行执行连续的多个循环体来加快循环的执行速度.在软件流水中,循环体的重叠增加了寄存器需求,导致寄存器压力增大,当目标处理机所提供的寄存器不足时,软件流水可能失败.在Itanium处理机上评估了NAS和SPEC2000基准程序中的软件流水循环的寄存器需求,发现静态寄存器不足是造成软件流水失败的主要原因,提出了3种增加软件流水个数、提高软件流水有效性的算法:限制循环展开因子的算法(register sensitive unrolling,简称RSU)、堆栈寄存器分配算法(stacked registerallocation,简称SRA)以及变量类型转换的算法(variabletype conversion,简称VTC).RSU根据静态寄存器需求确定一个合理的展开因子,增加了软件流水的成功率;SRA和VTC分别使用空闲的堆栈寄存器和旋转寄存器来充当静态寄存器,提高了寄存器的利用率.在面向Itanium处理器的开放源码编译器ORC(open research compiler)上实现了这3种算法,通过NAS程序的测试比较了这3种算法的有效性,同时对它们的结合应用进行了研究和实验.  相似文献   

13.
Global software pipelining is a complex but efficient compilation technique to exploit instruction-level parallelism for loops with branches.This paper presents a novel global software pipelining technique,called Trace Software Pipelining,targeted to the instruction-level parallel processors such as Very Long Instruction Word (VLIW) and superscalar machines.Trace software pipelining applies a global code scheduling technique to compact the original loop body.The resulting loop is called a trace software pipelined (TSP) code.The trace softwrae pipelined code can be directly executed with special architectural support or can be transformed into a globally software pipelined loop for the current VLIW and superscalar processors.Thus,exploiting parallelism across all iterations of a loop can be completed through compacting the original loop body with any global code scheduling technique.This makes our new technique very promising in practical compilers.Finally,we also present the preliminary experimental results to support our new approach.  相似文献   

14.
Decomposed software pipelining: A new perspective and a new approach   总被引:1,自引:0,他引:1  
Software pipelining is an efficient instruction-level loop scheduling technique, but existing software pipelining approaches have not been widely used in practical and commercial compilers. This is mainly because resource constraints and the cyclic data dependencies make software pipelining very complicated and difficult to apply. In this paper we present a new perspective on software pipelining in which it is decomposed into two subproblems—one is free from cyclic data dependencies and can be effectively solved by the list scheduling technique, and the other is free from resource constraints and can be easily solved by classical polynomial-time algorithms of graph theory. Based on this new perspective, we develop a new instruction-level loop scheduling approach, call DEcomposed Software Pipelining (DESP).  相似文献   

15.
软件流水的开销模型和决策框架   总被引:1,自引:0,他引:1       下载免费PDF全文
李文龙  林海波  汤志忠 《软件学报》2004,15(7):1005-1011
软件流水是一种重要的指令调度技术,它通过重叠地执行不同的循环体来提高指令级并行性(instruction level parallelism,简称ILP).模调度是一类被广泛采用的软件流水调度算法.软件流水并非一种无损的优化方法,它具有一定的开销,比如延长了编译时间、增加了寄存器压力等.而且,受到体系结构、调度算法以及程序特性的限制,进行软件流水并不一定能达到理想的加速比,有时反而会引起性能下降.提出了一种面向程序特性的软件流水开销模型,对此模型下的软件流水开销进行了量化分析,并提出了一种基于相关性分析的  相似文献   

16.
Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level loops, either reuse or concurrency or both may be inadequately exploited. In this paper, we optimize modulo scheduling to maximize stream reuse and improve concurrency for stream-level loops. The key insight is that an unrolled and software-pipelined stream-level loop could be described by a set of reuse equations. Guided by reuse equations, a reuse-aware modulo scheduling algorithm is developed to simultaneously optimize the two performance objectives, reuse, and concurrency, for a loop in a unified framework. Moreover, we describe a code generation algorithm to automatically produce the optimized loop from a given loop. The experimental results obtained on FT64 and by simulation demonstrate the effectiveness of the proposed approach.  相似文献   

17.
Software pipelining is an efficient method of loop optimization that allows for parallelism of operations related to different loop iterations. Currently, most commercial compilers use loop pipelining methods based on modulo scheduling algorithms. This paper reviews these algorithms and considers algorithmic solutions designed for overcoming the negative effects of software pipelining of loops (a significant growth in code size and increase in the register pressure) as well as methods making it possible to use some hardware features of a target architecture. The paper considers global-scheduling mechanisms allowing one to pipeline loops containing a few basic blocks and loops in which the number of iterations is not known before the execution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号