期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘旸张兆庆《计算机学报》2004,27(9):1198-1206

安腾处理器引入了硬件控制的寄存器栈，寄存器栈引擎能够自动地改变寄存器栈帧指针，对栈寄存器进行保存和恢复，从而有效地减少跨越过程调用时的寄存器值的保存和重新载入．每个过程使用的栈寄存器数量可以通过alloc指令显式地指定．通常的过程内寄存器分配方法给过程分配最大需要数量的栈寄存器．但过多的栈寄存器使用会引起寄存器栈溢出／载入．如果频繁出现这样的寄存器栈溢出／载入，将严重影响程序执行性能．该文提出了一种创新的算法，能够有效地减少RSE代价．该算法已经在开放源码编译器ORC中得到了实现．实验表明，SpeclNT2000在使用该算法后性能普遍提高，perlbmk的性能提高了14％，而crafty也有3．2％的性能提高．相似文献

2.

IA-64架构的寄存器机制的几点研究

吕克张凯《小型微型计算机系统》2003,24(4):737-738

IA－64架构是Intel公司开发出的新一代64位微处理器体系结构，它的设计思想介于传统的RISC（精简指令集计算机）和并行处理器之间，其特殊的寄存器栈机制为应用程序提供了大量可用的通用寄存器，作者对支持IA－64的编译器进行了设计和实现，过程了IA－64的寄存器结构，寄存器栈轮转做了一些深入研究，本文对比传统处理器架构中的寄存器结构，对该寄存器栈机制在编译器中实现的重要特点进行了阐述。相似文献

3.

控制与数据投机优化技术的研究 总被引：1，自引：0，他引：1

干戈连瑞琦张兆庆《计算机学报》2004,27(7):881-887

控制投机和数据投机是提高程序指令级并行度的有效方法．为了保证投机指令的正确执行，须解决两个问题，即延迟触发控制投机指令导致的异常和数据投机中的别名歧义．这需要硬件的支持才能做到，所以以前在这方面的研究大多是在模拟器上进行的，侧重于描述对模拟器结构的扩展．而IA-64是第一个同时支持这两种优化的体系结构．基于此，作者用一个统一的框架在IA-64开放源码研究编译器(ORC)中首次实现了控制与投机优化．该文以编译器为侧重点，介绍了投机优化中的几个核心问题及其解决方法，其中包括一种新的用来维护投机代码正确性的算法．实验结果表明这种方法是有效的．相似文献

4.

通过寄存器队列模型实现寄存器分配和指令调度

沈立肖晓强戴葵王志英《小型微型计算机系统》2004,25(4):757-761

寄存器分配与指令调度是编译器优化过程中的两项重要任务．由于这两个阶段通常是独立完成的，寄存器分配往往会引入不必要的伪相关，从而影响指令调度的效率和结果，影响最终性能的提高．本文提出了寄存器队列模型，并在其基础上提出了一种结合实现寄存器分配和指令调度的算法，该算法能够在保证每条指令的执行时间最早的同时使用最少数目的寄存器．它的另外一个优点是具有线性的时间和空间复杂度，而且易于硬件实现．相似文献

5.

IA-64的并行架构及其寄存器文件

下载免费PDF全文

邓晴莺张民选蒋江《计算机工程》2008,34(12):13-15

同时多线程能在同一时钟周期执行不同线程的指令,并且指令级并行和线程级并行。显式并行指令计算关注于编译器和硬件的相互协作。寄存器文件的设计在高性能处理器设计中十分重要,寄存器栈和寄存器栈引擎是提高其性能的重要手段。该文设计和实现一套并行环境,其中包括并行编译器OpenUH和基于IA-64的同时多线程体系结构EDSMT,实验表明,该并行架构适用于大多数并行应用,针对NAS的并行测试程序,该架构相对于SMTSIM平均有12.48%的性能提升。相似文献

6.

嵌入式C02语言编译器的设计与实现*

王昭顺杨树森李周芳《计算机应用研究》2004,21(7):96-98

介绍了一种用于嵌入式处理器Si02的高级语言编译器的设计与实现方法。提出了处理器Si02特有的寄存器分配方法——循环栈机制,并给出了编译器关键技术中的一些算法,简化了嵌入式编译器的实现过程。相似文献

7.

中间表示设计中基于链表的多寄存器操作数处理

刘章林石学林冯晓兵张兆庆《计算机工程》2006,32(1):25-27

以简单但具有代表性的配对寄存器为例，分析了编译器中间表示设计中使用配对信息所需包含的要点。结合编译器中数据流分析，指令调度和寄存器分配的需求，进一步提出了一种基于链表结构的中间表示及构造算法。所提出的表示方法同时考虑到编译器的可移植性，以便于在不同编译器中实现。相似文献

8.

IA-64中软件流水的寄存器需求研究 总被引：1，自引：0，他引：1

林海波李文龙汤志忠《计算机研究与发展》2004,41(1):22-27

软件流水是开发循环程序指令级并行性的重要方法之一，IA-64是支持软件流水的EPIC体系结构，通过对NAS Benchmarks中可软件流水循环所需的寄存器进行量化分析，提出了一种限制循环展开因子的启发式算法，有效地解决了因可用寄存器不足而导致软件流水失败的问题，并提高了应用程序的执行速度。相似文献

9.

用表驱动算法在GCC中优化实现指数函数

下载免费PDF全文

杨灿群王锋彭林杨学军《计算机工程与科学》2007,29(5):77-80

科学计算中的许多领域都需要快速而精确地计算超越函数，即exp、log、sin、tan等此类函数。本文采用表驱动算法，结合IA-64体系结构特点，在GCC中优化实现了指数函数（exp），提高了GCC编译器在IA-64系统上的浮点性能，为在IA-64和其它平台上高效实现所有超越函数打下了基础。相似文献

10.

摆动模调度中的寄存器溢出技术及其在GCC中的实现

杨旸顾国昌《小型微型计算机系统》2007,28(10):1822-1826

软件流水是一种通过发掘循环的不同迭代的不同部分的指令间并行性,使这些指令并行执行,从而提高循环的执行效率的优化技术.但该技术在提高指令并行性的同时也增加了寄存器压力,而寄存器溢出技术正是解决寄存器压力的有效方法.摆动模调度是一种在进行近似最优化调度的同时尽力减小寄存器压力的软件流水算法,该算法已经作为一个新的优化遍出现在GCC的最新版本中.本文以GCC为平台,论述了摆动模调度中的寄存器溢出技术及其工程实现,从而使摆动模调度算法进一步增强了对寄存器压力的处理能力. 相似文献

11.

3种提高软件流水有效性的算法:比较和结合 总被引：1，自引：0，他引：1

李文龙陈彧林海波汤志忠《软件学报》2005,16(10):1822-1832

软件流水是开发循环程序指令级并行性的技术,它通过并行执行连续的多个循环体来加快循环的执行速度.在软件流水中,循环体的重叠增加了寄存器需求,导致寄存器压力增大,当目标处理机所提供的寄存器不足时,软件流水可能失败.在Itanium处理机上评估了NAS和SPEC2000基准程序中的软件流水循环的寄存器需求,发现静态寄存器不足是造成软件流水失败的主要原因,提出了3种增加软件流水个数、提高软件流水有效性的算法:限制循环展开因子的算法(register sensitive unrolling,简称RSU)、堆栈寄存器分配算法(stacked registerallocation,简称SRA)以及变量类型转换的算法(variabletype conversion,简称VTC).RSU根据静态寄存器需求确定一个合理的展开因子,增加了软件流水的成功率;SRA和VTC分别使用空闲的堆栈寄存器和旋转寄存器来充当静态寄存器,提高了寄存器的利用率.在面向Itanium处理器的开放源码编译器ORC(open research compiler)上实现了这3种算法,通过NAS程序的测试比较了这3种算法的有效性,同时对它们的结合应用进行了研究和实验. 相似文献

12.

三维芯片多层与多核并行测试调度优化方法

陈田汪加伟安鑫任福继《计算机应用》2018,38(6):1795-1800

针对测试环节在三维（3D）芯片制造过程中成本过高的问题,提出一种基于时分复用（TDM）的协同优化各层之间、层与核之间测试资源的调度方法。首先,在3D芯片各层配置移位寄存器,通过移位寄存器组对输入数据的控制,实现对各层之间以及同一层的各个芯核之间的测试频率的合理划分,使位于不同位置的芯核能够被并行测试;其次,使用贪心算法优化寄存器的分配,减少芯核并行测试的空闲周期;最后,采用离散二进制粒子群优化（DBPSO）算法求出最优3D堆叠的布图,以便充分利用硅通孔（TSV）的传输潜力,提高并行测试效率,减少测试时间。实验结果表明,在功耗约束下,优化后整个测试访问机制（TAM）利用率平均上升16.28%,而3D堆叠的测试时间平均下降13.98%。所提方法减少了测试时间,降低了测试成本。相似文献

13.

Shifting register windows

Russell G. Shaw P. 《Micro, IEEE》1993,13(4):28-35

Shifting register windows, a register windowing method that attempts to overcome some of the difficulties of traditional fixed- and variable-sized schemes, is described. Using fewer register elements than a seven-window Sparc organization, shifting register windows more than halves spill/refill memory traffic and reduces visible spill/refill cycles by an order of magnitude. In addition, shifting register windows, a scheme based on fast hardware stack and register-memory dribbling, has a very short register bus length. It also zeros registers as they are being allocated, making common initialization unnecessary 相似文献

14.

高级综合中寄存器合并问题的研究

计算机工程《计算机工程》1999,25(5):1995

高级综合技术的研究在当前倍受关注。在进行资源分配时,为了减少互连线的数目,提高设计质量,应对数据路径综合中所需的寄存器进行合并。通过对寄存器合并问题进行研究分析,给出一种高级综合中的寄存器合并算法。经实验证明,该算法具有速度快、效率高的特点,应用在高级综合系统中时,可提高综合设计的质量。相似文献

15.

高级综合中寄存器合并问题的研究 总被引：2，自引：0，他引：2

袁小龙高德远《计算机工程》1999,25(7):51-53

高级综合技术的研究在当前倍受关注。在进行资源分配时,为了减少互连线的数目,提高设计质量,应对数据路径综合中所需的寄存器进行合并。通过对寄存器合工问题进行研究分析,给出了一种高级综合中的寄存器合并算法。经实验证明,该算法具有速度快,效率高的特点,应用在高级综合系统中时,可提高综合设计的质量。相似文献

16.

堆栈式寄存器堆及其应用

王俊宇王昭顺王沁《计算机工程与应用》2001,37(11):42-44,56

文章介绍了一种新的堆栈式寄存器堆的设计原理和设计技术,与通常堆栈式寄存器堆不同,该设计将栈顶寄存器组设为通用寄存器,增加了栈顶寄存器组的访问效率,同时通过缓冲寄存器及其控制使得该装置具有对不同字长的数据进行自适应存储的能力。该装置可以用于支持后缀语法或者对编码长度有限制的微处理器设计中。相似文献

17.

Solving the Register Allocation Problem for Embedded Systems Using a Hybrid Evolutionary Algorithm

Topcuoglu H.R. Demiroz B. Kandemir M. 《Evolutionary Computation, IEEE Transactions on》2007,11(5):620-634

Embedded systems are unique in the challenges they present to application programmers, such as power and memory space constraints. These characteristics make it imperative to design customized compiler passes. One of the important factors that shape runtime performance of a given embedded code is the register allocation phase of compilation. It is crucial to provide aggressive and sophisticated register allocators for embedded devices, where the excessive compilation time can be tolerated due to high demand on code quality. Failing to do a good job on allocating variables to registers (i.e., determining the set of variables to be stored in the limited number of registers) can have serious power, performance, and code size consequences. This paper explores the possibility of employing a hybrid evolutionary algorithm for register allocation problem in embedded systems. The proposed solution combines genetic algorithms with a local search technique. The algorithm exploits a novel, highly specialized crossover operator that takes into account domain-specific information. The results from our implementation based on synthetic benchmarks and routines that are extracted from well-known benchmark suites clearly show that the proposed approach is very successful in allocating registers to variables. In addition, our experimental evaluation also indicates that it outperforms a state-of-the-art register allocation heuristic based on graph coloring for most of the cases experimented. 相似文献

18.

One-write algorithms for multivalued regular and atomic registers

Soma Chaudhuri Martha J. Kosa Jennifer L. Welch 《Acta Informatica》2000,37(3):161-192

This paper presents an algorithm for implementing a k-valued regular register (the logical register) using binary regular registers (the physical registers) that requires only one physical write per logical write. The same algorithm using binary atomic registers implements a k-valued atomic register. The algorithm is simple to describe and depends on properties of paths in a related graph. Two lower bounds are given on the number of registers required by one-write implementations in the regular case. The first lower bound, , holds for a fairly general class of algorithms. The second lower bound holds for a restricted class of implementations and implies that our algorithm is optimal for this class. Both lower bounds improve on the best previously known lower bound, which was k. The two lower bounds also hold for the atomic case under further restrictions. Received: 9 June 2000 相似文献

19.

An Energy-Efficient Processor Architecture for Embedded Systems 总被引：1，自引：0，他引：1

Balfour James Dally William Black-Schaffer David Parikh Vishal Park JongSoo 《Computer Architecture Letters》2008,7(1):29-32

We present an efficient programmable architecture for compute-intensive embedded applications. The processor architecture uses instruction registers to reduce the cost of delivering instructions, and a hierarchical and distributed data register organization to deliver data. Instruction registers capture instruction reuse and locality in inexpensive storage structures that are located near to the functional units. The data register organization captures reuse and locality in different levels of the hierarchy to reduce the cost of delivering data. Exposed communication resources eliminate pipeline registers and control logic, and allow the compiler to schedule efficient instruction and data movement. The architecture keeps a significant fraction of instruction and data bandwidth local to the functional units, which reduces the cost of supplying instructions and data to large numbers of functional units. This architecture achieves an energy efficiency that is 23× greater than an embedded RISC processor. 相似文献

20.

Register Saturation in Instruction Level Parallelism

Sid-Ahmed-Ali?Touati Email author 《International journal of parallel programming》2005,33(4):393-449

The registers constraints are usually taken into account during the scheduling pass of an acyclic data dependence graph (DAG): any schedule of the instructions inside a basic block must bound the register requirement under a certain limit. In this work, we show how to handle the register pressure before the instruction scheduling of a DAG. We mathematically study an approach which consists in managing the exact upper-bound of the register need for all the valid schedules of a considered DAG, independently of the functional unit constraints. We call this computed limit the register saturation (RS) of the DAG. Its aim is to detect possible obsolete register constraints, i.e., when RS does not exceed the number of available registers. If it does, we add some serial edges to the original DAG such that the worst register need does not exceed the number of available registers. We propose an appropriate mathematical formalism for this problem. Our generic processor model takes into account superscalar, VLIW and EPIC/IA64 architectures. Our deeper analysis of the problem and our formal methods enable us to provide nearly optimal heuristics and strategies for register optimization in the face of ILP. 相似文献