期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Wang Jian Andreas Krall M.Anton Ertl 《计算机科学技术学报》1995,10(6):481-490

Global software pipelining is a complex but efficient compilation technique to exploit instruction-level parallelism for loops with branches.This paper presents a novel global software pipelining technique,called Trace Software Pipelining,targeted to the instruction-level parallel processors such as Very Long Instruction Word (VLIW) and superscalar machines.Trace software pipelining applies a global code scheduling technique to compact the original loop body.The resulting loop is called a trace software pipelined (TSP) code.The trace softwrae pipelined code can be directly executed with special architectural support or can be transformed into a globally software pipelined loop for the current VLIW and superscalar processors.Thus,exploiting parallelism across all iterations of a loop can be completed through compacting the original loop body with any global code scheduling technique.This makes our new technique very promising in practical compilers.Finally,we also present the preliminary experimental results to support our new approach. 相似文献

2.

摆动模调度中的寄存器溢出技术及其在GCC中的实现

杨旸顾国昌《小型微型计算机系统》2007,28(10):1822-1826

软件流水是一种通过发掘循环的不同迭代的不同部分的指令间并行性,使这些指令并行执行,从而提高循环的执行效率的优化技术.但该技术在提高指令并行性的同时也增加了寄存器压力,而寄存器溢出技术正是解决寄存器压力的有效方法.摆动模调度是一种在进行近似最优化调度的同时尽力减小寄存器压力的软件流水算法,该算法已经作为一个新的优化遍出现在GCC的最新版本中.本文以GCC为平台,论述了摆动模调度中的寄存器溢出技术及其工程实现,从而使摆动模调度算法进一步增强了对寄存器压力的处理能力. 相似文献

3.

Time Optimal Software Pipelining of Loops with Control Flows

Han-Saem Yun Jihong Kim Soo-Mook Moon 《International journal of parallel programming》2003,31(5):339-391

Software pipelining is widely used as a compiler optimization technique to achieve high performance in machines that exploit instruction-level parallelism. However, surprisingly, there have been few theoretical or empirical results on time optimal software pipelining of loops with control flows. In this paper, we present three new theoretical and practical contributions for this underinvestigated problem. First, we propose a necessary and sufficient condition for a loop with control flows to have an optimally software-pipelined program. We also present a decision procedure to compute the condition. As part of the formal treatment of software pipelining, we propose a new formalization of software pipelining. Second, we present two software pipelining algorithms. The first algorithm computes an optimal solution for every loop satisfying the condition, but may run in exponential time. The second algorithm computes optimal solutions efficiently for most (but not all) loops satisfying the condition. The former one proves the sufficiency of the condition and the latter one suggests a practical optimal software pipelining algorithm. Third, we present experimental results which strongly indicate that achieving the time optimality in the software-pipelined programs is a viable goal in practice with reasonable hardware support. 相似文献

4.

Loop Shifting for Loop Compaction

Alain Darte Guillaume Huard 《International journal of parallel programming》2000,28(5):499-534

The idea of decomposed software pipelining is to decouple the software pipelining problem into a cyclic scheduling problem without resource constraints and an acyclic scheduling problem with resource constraints. In terms of loop transformation and code motion, the technique can be formulated as a combination of loop shifting and loop compaction. Loop shifting amounts to moving statements between iterations thereby changing some loop independent dependences into loop carried dependences and vice versa. Then, loop compaction schedules the body of the loop considering only loop independent dependences, but taking into account the details of the target architecture. In this paper, we show how loop shifting can be optimized so as to minimize both the length of the critical path and the number of dependences for loop compaction. The first problem is well-known and can be solved by an algorithm due to Leiserson and Saxe. We show that the second optimization (and the combination with the first one) is also polynomially solvable with a fast graph algorithm, variant of minimum-cost flow algorithms. Finally, we analyze the improvements obtained on loop compaction by experiments on random graphs. 相似文献

5.

Circuit retiming applied to decomposed software pipelining

Calland P.-Y. Darte A. Robert Y. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(1):24-35

This paper elaborates on a new view on software pipelining, called decomposed software pipelining. The approach is to decouple the problem into resource constraints and dependence constraints. Resource constraints management amounts to scheduling an acyclic graph subject to resource constraints for which an efficiency bound is known, resulting in a bound for loop scheduling. The acyclic graph is obtained by cutting some particular edges of the (cyclic) dependence graph. In this paper, we cut edges in a different way, using circuit retiming algorithms, so as to minimize both the longest dependence path in the acyclic graph, and the number of edges in the acyclic graph. With this technique, we improve the efficiency bound given for Gasperoni and Schwlegelshohn algorithm, and we reduce the constraints that remain for the acyclic problem. We believe this framework to be of interest because it brings a new insight into the software problem by establishing its deep link with the circuit retiming problem 相似文献

6.

模调度中的数据猜测方法

钟明郭振宇汤志忠《计算机应用与软件》2005,22(10):14-16

软件流水是一种重要的指令调度技术，通过重叠地执行不同的循环体来提高指令级并行性。模调度是一类重要的软件流水调度算法，保守的相关性分析可能会引入较多的模糊相关，这阻碍了模调度生成高效的调度结果。数据猜测能克服保守的相关性分析带来的调度限制，开发潜在的并行性。本文提出了模调度中的一种数据猜测方法，在开放源代码编译器ORC上实现了该方法，并用SPEC2000基准程序进行了测试，实验结果表明，该方法收到了较好的效果。相似文献

7.

基于路径分组与数据相关松弛的软件流水

容红波汤志忠《软件学报》2001,12(4):544-555

软件流水是循环调度的重要方法.有分支循环的流水依然是个难题.现有算法可以分为4类:循环线性化、路径分离、整体调度和路径选择.它们都未能和谐地解决两个对立问题:转移时间最小化和最差约束问题.提出了基于路径分组和数据相关松弛的软件流水框架,试图无矛盾地解决上述问题.其主要思想是:(1)路径分组,即按照路径的执行概率和转移概率将路径分组,力求最小化转移时间;(2)数据相关松弛,力求避免最差约束,即当循环有多条路径时,有些相关在循环执行中并不一定有实例,理想的策略是仅当它有实例时才遵守.初步实验和定性分析表明,此相似文献

8.

Specification of software pipelining using petri nets 总被引：1，自引：0，他引：1

M. Rajagopalan V. H. Allan 《International journal of parallel programming》1994,22(3):273-301

This paper presents a flexible model for software pipelining using the petri nets. Our technique, called the Petri Net Pacemaker (PNP), can create near optimal pipelines with less algorithmic effort than other techniques. The pacemaker is a novel idea which exploits the cyclic behavior of petri nets to model the problem of scheduling operations of a loop body for software pipelining. A way of improving the performance of loops containing predicates is given. The PNP technique also shows how nested loops can be pipelined. A comparison with some of the other techniques is presented. THis work was partially supported by the National Science Foundation under grants CDA-9100788 and CDA-9200371. 相似文献

9.

弹性数据相关与软件流水 总被引：1，自引：0，他引：1

容红波汤志忠《软件学报》2001,12(6):894-906

最差路径是有分支循环软件流水的一大障碍.对于有分支循环,某些数据相关(称为弹性相关)在循环的动态执行中可能产生、也可能不产生实例.据此,可将严重限制并行性的弹性相关用限制较松的虚构相关代替,再进行软件流水.若调度没有遵守原来的弹性相关,则使用下推变换修正.从而缓解或者完全解除了最差路径的限制.该方法与经典的控制猜测互补,特点是允许调度含错,然后纠错. 相似文献

10.

一种软件流水的反流水算法 总被引：1，自引：0，他引：1

下载免费PDF全文

汤志忠李文龙苏伯珙《软件学报》2004,15(7):987-993

软件流水是一种循环程序的优化技术,已经广泛应用于现代优化编译器中.为了充分利用VLIW DSP处理机的指令级并行性,必须使用软件流水技术对DSP程序进行优化.然而,在串行源代码不存在的情况下,对软件流水后的原始代码进行变换、理解、测试和调试,并转换成其他处理机的代码是非常困难的.提出了一种反流水技术,它能够将软件流水后的优化汇编代码反向转换成语义等价的相应代码.通过20个程序的初步实验,验证了所提出的反流水算法的正确性. 相似文献

11.

IA-64中软件流水的寄存器需求研究 总被引：1，自引：0，他引：1

林海波李文龙汤志忠《计算机研究与发展》2004,41(1):22-27

软件流水是开发循环程序指令级并行性的重要方法之一，IA-64是支持软件流水的EPIC体系结构，通过对NAS Benchmarks中可软件流水循环所需的寄存器进行量化分析，提出了一种限制循环展开因子的启发式算法，有效地解决了因可用寄存器不足而导致软件流水失败的问题，并提高了应用程序的执行速度。相似文献

12.

软件流水的开销模型和决策框架 总被引：1，自引：0，他引：1

下载免费PDF全文

李文龙林海波汤志忠《软件学报》2004,15(7):1005-1011

软件流水是一种重要的指令调度技术,它通过重叠地执行不同的循环体来提高指令级并行性(instruction level parallelism,简称ILP).模调度是一类被广泛采用的软件流水调度算法.软件流水并非一种无损的优化方法,它具有一定的开销,比如延长了编译时间、增加了寄存器压力等.而且,受到体系结构、调度算法以及程序特性的限制,进行软件流水并不一定能达到理想的加速比,有时反而会引起性能下降.提出了一种面向程序特性的软件流水开销模型,对此模型下的软件流水开销进行了量化分析,并提出了一种基于相关性分析的相似文献

13.

一个新的多分支全局软件流水方法 总被引：1，自引：0，他引：1

下载免费PDF全文

汤志忠张赤红陈刚《软件学报》1996,7(1):16-24

在指令级并行性很高的体系结构中，为了得到比较好的并行优化效果，通常需要设置多个分支控制机构，本文提出一个新的支持多个分支操作并行执行的全局软件流水方法──ＧＰＭＢ．并用衡量全局软件流水方法性能的两个主要参数：时间开销和空间开销把我们的方法与其它几种全局软件流水方法进行了比较．模拟实验结果表明：ＧＰＭＢ方法的时间开销和空间开销都比较小，所需要的硬件支持也比较少．相似文献

14.

The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor 总被引：1，自引：0，他引：1

Michael Gschwind 《International journal of parallel programming》2007,35(3):233-262

As CMOS feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance (e.g., deep pipelining) become too expensive and power-hungry, chip multiprocessors (CMPs) become an exciting new direction by which system designers can deliver increased performance. Exploiting parallelism in such designs is the key to high performance, and we find that parallelism must be exploited at multiple levels of the system: the thread-level parallelism that has become popular in many designs fails to exploit all the levels of available parallelism in many workloads for CMP systems. We describe the Cell Broadband Engine and the multiple levels at which its architecture exploits parallelism: data-level, instruction-level, thread-level, memory-level, and compute-transfer parallelism. By taking advantage of opportunities at all levels of the system, this CMP revolutionizes parallel architectures to deliver previously unattained levels of single chip performance. We describe how the heterogeneous cores allow to achieve this performance by parallelizing and offloading computation intensive application code onto the Synergistic Processor Element (SPE) cores using a heterogeneous thread model with SPEs. We also give an example of scheduling code to be memory latency tolerant using software pipelining techniques in the SPE. This paper is based in part on “Chip multiprocessing and the Cell Broadband Engine”, ACM Computing Frontiers 2006. 相似文献

15.

一种IA-64下的反软件流水算法

下载免费PDF全文

汪淼赵荣彩蔡国明《计算机工程与应用》2007,43(23):58-60

软件流水是一种循环程序的优化技术,它可以有效地提高指令级并行性。由于处理机的实现方法各不相同,在一种处理机上经过软件流水优化后的循环代码很难在其它处理机中移植和使用。反软件流水是软件流水的逆向操作,它可以消除循环代码中的软件流水特性,以便于代码在不同平台上的移植。基于IA-64体系结构,分析了软件流水的代码特点,提出了反流水算法,用于将ICC编译器编译后的可执行二进制代码消除软件流水特性,转换成语义等价的C代码。相似文献

16.

3种提高软件流水有效性的算法:比较和结合 总被引：1，自引：0，他引：1

李文龙陈彧林海波汤志忠《软件学报》2005,16(10):1822-1832

软件流水是开发循环程序指令级并行性的技术,它通过并行执行连续的多个循环体来加快循环的执行速度.在软件流水中,循环体的重叠增加了寄存器需求,导致寄存器压力增大,当目标处理机所提供的寄存器不足时,软件流水可能失败.在Itanium处理机上评估了NAS和SPEC2000基准程序中的软件流水循环的寄存器需求,发现静态寄存器不足是造成软件流水失败的主要原因,提出了3种增加软件流水个数、提高软件流水有效性的算法:限制循环展开因子的算法(register sensitive unrolling,简称RSU)、堆栈寄存器分配算法(stacked registerallocation,简称SRA)以及变量类型转换的算法(variabletype conversion,简称VTC).RSU根据静态寄存器需求确定一个合理的展开因子,增加了软件流水的成功率;SRA和VTC分别使用空闲的堆栈寄存器和旋转寄存器来充当静态寄存器,提高了寄存器的利用率.在面向Itanium处理器的开放源码编译器ORC(open research compiler)上实现了这3种算法,通过NAS程序的测试比较了这3种算法的有效性,同时对它们的结合应用进行了研究和实验. 相似文献

17.

避免模调度中cache代价的优化方法 总被引：1，自引：0，他引：1

刘利李文龙郭振宇李胜梅汤志忠《软件学报》2005,16(10):1842-1852

软件流水能够加快循环的执行速度.模调度是一种被广泛采用的软件流水的启发式.为了改善存储系统,cache使用了分级机制,但这也带来了额外的存储延迟-cache代价.证明了模调度可能导致cache代价,并提出了一种可以避免模调度的cache代价的PCPMS(prevent cache penalty in modulo scheduling)算法.实验结果表明,PCPMS能够避免模调度中的cache代价,提高程序性能. 相似文献

18.

IA-64软件流水的反流水算法研究

崔平非庞建民赵荣彩崔雪冰《计算机应用》2006,26(8):1919-1921

软件流水是一种开发循环程序指令级并行性的技术, 它通过并行执行连续的多个迭代来加快循环的执行速度。而在逆向工程中,软件流水却为逆向翻译带来了困难。为此,基于IA-64平台,提出了一种反流水算法,针对循环中包含软件流水的汇编代码进行处理,将其反向转换成语义等价的串行代码,并通过实验验证了该算法的有效性,为在二进制翻译中处理软件流水代码奠定了基础。相似文献

19.

资源约束的FPGA流水线调度 总被引：1，自引：0，他引：1

下载免费PDF全文

宋健葛颖增窦勇《计算机工程》2008,34(15):44-46

循环是程序中十分耗时的部分,流水线能够加速循环执行但需要大量运算资源。由于FPGA资源有限,将循环代码在FPGA上加速时手动设计流水线不具有实际可行性。该文使用软件流水将循环自动映射到FPGA上,并实现资源约束下的流水线调度。通过探索整个或者局部资源组合空间,可以选择一个性能和面积比较平衡的设计。相似文献

20.

Split-path enhanced pipeline scheduling

SangMin Shim Soo-Mook Moon 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(5):447-462

Software pipelining increases the loop execution throughput by overlapping the execution of successive iterations in a pipelined fashion. For loops with control flows, however, software pipelining is not straightforward because we need to consider the overlap of more than one execution path. Modulo scheduling simply transforms them into straightline loops through if-conversion which, in effect, achieves a fixed, worst-case initiation interval (/spl par/) among all paths. On the other hand, all-path pipelining (APP) and enhanced pipeline scheduling (EPS) can achieve a variable /spl par/ depending on the path that is taken at execution time. Unfortunately, APP concentrates only on the overlap within the same path, entirely losing the overlap between different paths, whereas EPS attempts to overlap all paths together, failing to produce a tight schedule for each individual path, especially when resource constraints are tight. In this paper, we propose a new approach to EPS called split-path EPS (SP-EPS), which first splits each individual path via tail duplication and then performs EPS in a way to guarantee a tight schedule for each path, while producing a competitive cross-path schedule. We also extend SP-EPS to outer loops such that frequent paths that bypass the inner loop are split and then scheduled by SP-EPS. Our experimental results on nontrivial integer benchmarks show that SP-EPS can achieve as much as a geometric mean of 10 percent speedup over EPS when innermost loops are scheduled by SP-EPS, while it can achieve a geometric mean of 11.9 percent speedup when outer loops are also scheduled by SP-EPS. 相似文献