期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

田祖伟李勇帆《计算机科学》2009,36(3):45-47

随着嵌入式处理器在各个领域的广泛应用,嵌入式软件的复杂度越来越高.充分发掘嵌入式处理器的性能,需要高级编译优化技术的支持.指令调度是编译器发掘程序指令级并行性的关键技术之一.设计并实现了一个基于汇编代码的指令调度器.实验结果表明,在TECC嵌入式编译器中集成指令调度器后可显著提高程序的性能. 相似文献

2.

基于GCC的IF转换算法的分析与改进 总被引：1，自引：0，他引：1

田祖伟赵克佳《计算机科学》2005,32(7):242-244

分支指令是发掘指令级并行(ILP)的一个主要障碍。IF转换能够有效地删除指令流中的分支，通过删除程序中的莱些分支，将控制依赖转换为数据依赖。能够获得更好的调度效果。本文详细分析了GCC中基于IA-64谓词执行的IF转换算法，并改进了其算法。实验数据表明。优化效果明显。相似文献

3.

指令调度中推断和推测技术的研究

叶崴马杰侯朝焕《微计算机应用》2006,27(6):691-693

编译器提高程序并行性的主要障碍是：频繁的控制转移和模棱两可的内存访问。推断和推测是vliw处理器体系结构的新特点，为了消除分支或访存对指令级并行性识别的影响。指令调度是编译器挖掘程序指令级并行性的关键技术之一，本文论述了如何在指令调度中有效地利用推断和推测技术，提高程序的性能。相似文献

4.

改进的指令总线功耗优化策略

徐步荣李曦魏亮辉《计算机辅助工程》2007,16(1):64-68

针对编译器系统设计和编译中的低功耗优化,基于可重定向编译器,实现在编译器后端对VLIW指令总线进行功耗优化的策略.通过对编译生成的二进制目标码进行横向再调度来减少指令总线上的高低电位切换次数,达到降低系统功耗的目的.对编译后端的软件流水和超块调度两种性能优化策略进行对比实验,表明其优化效果在30%以上,并且代码的指令级并行性(Instruction Level Parallelism,ILP)与优化效果存在明显的相关性.最后,通过ILP对该策略提出改进,以指令级并行信息指导功耗优化,在功耗优化效果损失不大的前提下,可节省多达20%的算法开销. 相似文献

5.

支持有向有环图的微调度方法 总被引：1，自引：0，他引：1

文严治连瑞琦吴承勇冯晓兵张兆庆《计算机研究与发展》2005,42(3):387-393

指令调度是编译器中的重要优化阶段．如何充分利用处理器结构相关的资源,发掘程序并行性,以提高编译优化性能和增强代码可适应性,一直是指令调度的研究难点之一．目前微调度已经取得了一定的效果,但对软件流水产生的有向有环图则未能提供支持．在ORC中提出并实现了一种基于IA-64体系结构的支持有向有环图的微调度方法,有效地减少了程序执行周期和流水线停顿,取得了较为满意的编译优化性能．相似文献

6.

动态二进制翻译中的指令调度技术研究与实现

孙俊文延华漆锋滨《计算机应用与软件》2008,25(1):17-19

动态二进制翻译提供了无需重新编译源代码就能将源机器生成的可执行代码自动转换到目标机器的方法,很好地解决了代码兼容性问题.其核心思想是根据程序的动态运行信息找到反复执行的代码序列,对代码序列进行翻译和优化,并将结果多次重用.指令调度作为一种有效的编译优化手段,也适用于动态二进制翻译.在对gcc的指令调度器分析研究的基础上,结合动态二进制翻译的实时性特点,提出了适合动态二进制翻译的效率高、开销小的指令调度算法. 相似文献

7.

一个基于DAG图的指令调度优化算法 总被引：1，自引：0，他引：1

陆伯鹰尹宝林《计算机工程与应用》2001,37(12):121-124

指令调度是优化编译技术中一项关键技术,对于VLIW体系结构的CPU,指令调度显得尤为重要。指令调度是在保证语义正确的前提下,改变指令的执行顺序,减少流水线中的空闲周期,从而提高CPU性能的一种优化方法。文章着重分析了优化编译中的指令调度问题,提出了一个指令调度算法和DAG图的一种化简方法,证明了算法的正确性,分析了算法的效率,比较了生成的新指令序列和最优的指令序列总的执行时间的差别。同时,针对目前流行的编译器GCC的指令调度算法中存在的问题,提出了一个较好的解决途径。相似文献

8.

指令级并行中谓词分析技术的研究 总被引：2，自引：0，他引：2

芦运照张兆庆连瑞琦《计算机学报》2003,26(10):1337-1342

谓词支持是IA 6 4体系结构的新特征 ,它为发掘指令级并行提供了更多的机会 ,同时给编译器的设计者增加了难度 .谓词是条件执行的依据 ,是提高指令级并行的新途径 .该文介绍在ORC(IA 6 4OpenResearchCompiler)中首次设计实现的基于谓词划分图的谓词分析技术及其在指令调度中的应用 .利用谓词分析技术建立了谓词关系数据库、指令调度查询谓词关系数据库提高指令级并行 .文章着重论述了谓词关系数据库的核心———谓词划分图的建立 ,在谓词划分图的基础上实现了谓词关系的计算和查询 ,实际结果表明谓词分析技术有显著优化效果 . 相似文献

9.

基于线程集成的系统设计方法

蒋书波刘仲辉程明霄《计算机工程与设计》2008,29(6):1380-1383

实现嵌入式系统任务的并行性是改善系统性能的基本手段.通过分析影响嵌入式系统性能的主要因素,采用了基于线程概念的嵌入式系统并行设计方法,利用指令级并行来改善系统性能.主要论述了线程集成的实现方法,通过编译技术在指令级代码中融合多个线程,从而实现任务的并行性,并将该方法应用于仪器仪表显示模块的设计. 相似文献

10.

分片式处理器上的谓词执行技术优化

邓春华安虹路璐王耀彬《小型微型计算机系统》2012,33(2):399-403

谓词执行能使分片式处理器充分利用众多的执行单元,开发指令级并行性.但因此形成的超块也使得分支误预测代价增大,所以提高分支预测器的性能至关重要.本文提出一种基于剖析信息决策的谓词执行技术,该技术利用剖析信息对谓词执行前后的执行周期进行估算,从而对分支的谓词执行进行决策.该技术使分支预测器的命中率提高了0.68%～3.50%,使系统性能提高了1.67%～8.33%.同时,利用select指令表示谓词化指令也消除了重命名阶段寄存器多定义问题. 相似文献

11.

Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

Ben Abdallah Abderazek Masashi Masuda Arquimedes Canedo Kenichi Kuroda 《The Journal of supercomputing》2011,57(3):314-338

This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, making instructions short and the programs free of false dependencies. This characteristic allows the exploitation of maximum parallelism and improves code density. Compiling for the QueueCore requires a new approach since the concept of registers disappears. We propose a new efficient code generation algorithm for the QueueCore. For a set of numerical benchmark programs, our compiler extracts more parallelism than the optimizing compiler for an RISC machine by a factor of 1.38. Through the use of QueueCore’s reduced instruction set, we are able to generate 20% and 26% denser code than two embedded RISC processors. 相似文献

12.

Profile-assisted instruction scheduling

William Y. Chen Scott A. Mahlke Nancy J. Warter Sadun Anik Wen-Mei W. Hwu 《International journal of parallel programming》1994,22(2):151-181

Instruction schedulers for superscalar and VLIW processors must expose sufficient instruction-level parallelism to the hardware in order to achieve high performance. Traditional compiler instruction scheduling techniques typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instruction-level parallelism for the frequent execution scenarios at the expense of the less freuent ones. Profile information identifies these important execution scenarios in a program. In this paper, two major categories of profile information are studied: control-flow and memory-dependence. Profile-assisted code scheduling techniques have been incorporated into the IMPACT-I compiler. These techniques are acyclic global scheduling and software pipelining. This paper describes the scheduling algorithms, highlights the modifications required to use profile information, and explains the hardware and compiler support for dealing with hazards that arise from aggressive use of profile information. The effectiveness of these profile-based scheduling techniques is evaluated for a range of superscalar and VLIW processors. 相似文献

13.

Path Analysis and Renaming for Predicated Instruction Scheduling

Lori Carter Beth Simon Brad Calder Larry Carter Jeanne Ferrante 《International journal of parallel programming》2000,28(6):563-588

Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose and use such parallelism, traditional compiler data-flow and path analysis needs to be extended to predicated code. In this paper, we motivate the need for renaming and for predicates that reflect path information. We present Predicated Static Single Assignment (PSSA) which uses renaming and introduces Full -Path Predicates to remove false dependences and enable aggressive predicated optimization and instruction scheduling. We demonstrate the usefulness of PSSA for Predicated Speculation and Control Height Reduction. These two predicated code optimizations used during instruction scheduling reduce the dependence length of the critical paths through a predicated region. Our results show that using PSSA to enable speculation and control height reduction reduces execution time from 12 to 68%. 相似文献

14.

新型体系结构概念—虚拟寄存器与并行的指令处理部件 总被引：4，自引：1，他引：3

李三立廖恒《小型微型计算机系统》1995,16(6):6-11

随着程序对地址空间的需求日益提高，研究者提出了虚拟存储器概念，使程序访问的地址空间免受物理存储器的限制。随着面向寄存器的ＲＩＳＣ技术发展以及多发射结构中指令调度的日益重要，我们提出了虚拟寄存器的新概念，使寄存器空间不受物理寄存器堆大小的束缚，有利于指令调度和寄存器重新命名技术，提高指令级并行性ＩＬＰ。此外，现代新型ＲＩＳＣ处理机都着重于加强数据处理部件中的执行并行度，忽略了放在存储器中指令的处理。相似文献

15.

Compilation techniques for a reconfigurable LIW architecture

Rajiv Gupta Mary Lou Soffa 《The Journal of supercomputing》1989,3(4):271-304

Matching an application to an architecture in structure and size is a way of achieving higher computation speed. This paper presents a combination of a compiler and a reconfigurable long instruction word (RLIW) architecture as an approach to the matching problem. Configurations suitable for the execution of different parts of a program are determined by a compiler, and code is generated for both reconfiguring the hardware and performing the computation. The RLIW machine, consisting of multiple processing and global data memory modules, effectively utilizes the fine-grained parallelism detected in programs by a compiler. The long word instructions control the operation of processing and memory modules in the system. To reduce the data transfer between processing modules and data memory modules, we provide reconfigurable interconnections among the processing modules which permit direct communication. The compiler uses new techniques, including region scheduling, generation of code for reconfiguration of the system, and memory allocation techniques, to achieve improved performance. Algorithms for packing operations into long word instructions and techniques for effectively assigning memory modules to the operands required by an instruction are developed. Results of the experiments to evaluate the system indicate that speedups of 60–300% can be obtained for both scientific and nonscientific programs. The reconfigurable architecture is responsible for much of the speedup. Also, the results indicate that the major problem of memory bottleneck faced in designing parallel systems is successfully attacked.This paper represents work done while the author was at the University of Pittsburgh 相似文献

16.

基于简化Trace的动态隐式断言执行

唐遇星邓鹍窦勇周兴铭《计算机学报》2007,30(11):1972-1981

分支指令与分支预测失败限制了处理器发掘指令级并行(ILP)的潜力.通过If-conversion或Predicated执行将程序中的控制相关转化为数据相关,能较好地降低分支预测开销.提出一种基于简化Trace结构的动态隐式断言执行机制(Dynamic Implicit Predication,DIP),而早期的相关研究主要集中于由编译器显式为宽发射处理器产生静态Predicated指令.无需编译器或者其他二进制工具的帮助,DIP可以在程序运行过程中识别可以进行断言变换的指令片断,完成指令转换与优化,并在以后的执行中使用优化后的指令Trace.基于SPEC2000模拟测试表明DIP可以有效避免错误的分支预测,提高并行度,单个程序的IPC平均提高10.3%,基准程序的平均加速比可达7.59%. 相似文献

17.

指令调度中的寄存器重命名技术

张军超张兆庆《计算机工程》2005,31(23):8-10

指令间的依赖关系是阻碍指令调度发挥作用，进而影响指令级并行的主要障碍。寄存器重命名是解决控制依赖和数据依赖的一种重要技术。研究并实现了一种指令调度中的寄存器重命名技术。它在164．gzip和186．crafty上分别取得了约5％和3％的加速比。相似文献

18.

Evaluating the impact of reissued instructions on data speculative processor performance

Toshinori 《Microprocessors and Microsystems》2002,25(9-10):469-482

In this paper, we investigate the impact of instructions reissued due to misspeculated data dependences on processor performance. Recently, the practice of speculation in resolving data dependences has been studied as a means of extracting more instruction level parallelism. When a misspeculation occurs, it is necessary to revert the processor state to a safe point where the speculation is initiated, with an instruction reissue mechanism utilized for that purpose. The instruction reissue suffers less miss penalties than instruction squashing which handles misspeculated control flows in current generation processors, but causes redundant instruction dispatching, i.e. multiple copies of an instruction are in flight in functional units. The effectiveness of data speculation would be diminished, if reissued instructions caused serious structural hazards. Therefore, we evaluate how the instruction reissue affects processor performance using an execution-driven simulator. We find that overhead due to instruction reissue is sufficiently small so as to allow data speculation to contribute to processor performance. 相似文献