共查询到18条相似文献,搜索用时 203 毫秒
1.
超长指令字处理器为了提高指令集并行(ILP)往往采用多个功能单元,从而需要多端口的寄存器文件提供支持.但是寄存器文件会随着端口的增多变得更复杂,频率难以提升,成为系统的瓶颈.分簇是解决这一问题的有效手段.分簇在不影响处理器ILP的前提下减少了每簇寄存器文件的端口数目,但对编译器提出了挑战,编译器必须将指令和操作数在簇间进行合理分配才能得到较好的指令级并行.针对分簇超长指令字结构提出了一种基于超块的统一分簇与模调度编译方法.使用超块技术可以增大调度范围以获得更好的ILP,并且可以处理含有控制流的循环体,增加了模调度的适用范围.超块中指令的分簇与模调度则是统一进行的,这将比分阶段进行有更好的优化效果,因为统一进行是从全局的角度寻求优化而非寻求各个阶段局部优化.在YHFT-DSP/700编译器中的实验结果表明,与ITSS算法相比,该算法可以达到较好的优化效果. 相似文献
2.
一种基于可重定向编译器的功耗优化框架 总被引:1,自引:0,他引:1
当今,低功耗设计成为系统设计中的关键问题之一,而编译中的低功耗优化也成为系统设计中的一个重要环节.文章针对传统功耗优化缺乏通用性的缺点,提出一个基于可重定向编译器的功耗优化框架.该框架通过对编译生成的二进制目标码进行横向再调度来降低指令总线上的高低电位切换次数,从而达到降低系统功耗的目的,并且,基于xpADL的支持,为该框架提供不同的体系结构描述,可以生成针对不同体系结构的功耗优化代码.以IA-64体系结构为例,在其仿真器Ski上作了大量实验,实验表明,对于静态代码,该框架的优化可达25%左右,对于动态代码,该框架可以达到30%以上的优化.因此,该框架的优化是行之有效的,并且具有相当的可扩展性. 相似文献
3.
基于汇编代码的指令调度器的设计与实现 总被引:1,自引:0,他引:1
随着嵌入式处理器在各个领域的广泛应用,嵌入式软件的复杂度越来越高.充分发掘嵌入式处理器的性能,需要高级编译优化技术的支持.指令调度是编译器发掘程序指令级并行性的关键技术之一.设计并实现了一个基于汇编代码的指令调度器.实验结果表明,在TECC嵌入式编译器中集成指令调度器后可显著提高程序的性能. 相似文献
4.
分簇结构超长指令字DSP编译器的设计与实现 总被引:5,自引:0,他引:5
超长指令字(VLIW)是高端DSP普遍采用的体系结构。VLIW DSP在硬件上没有调度和冲突判决的机制,其性能的发挥完全依靠编译嚣的优化效果.基于可重定向编译基础设施IMPACT,为分簇VLIW DSP YHFT—D4设计与实现了优化编译器.其中着重讨论了可重定向信息的定义、代码注释、SIMD指令的支持、分簇寄存器分配以度指令级并行开发和资源冲突解决等内容.实验结果表明该编译器可以达到较好的优化效果. 相似文献
5.
6.
随着社会信息化水平的不断提高,信息产业的快速发展,由此带来了能源的消耗也越来越高。特别是芯片集成度越来越高,系统应用越来越复杂,这就使得功耗问题成为嵌入式系统必须面对的一个关键问题。单纯的硬件功耗优化已经不能满足要求,基于软件的功耗优化取得了很好的成效。在编译阶段,通过减少总线的翻转次数来降低系统的功耗。针对指令地址总线,结合遗传算法进行函数段的分配,结合相关的编码策略,减少总线翻转,从而降低其功耗。针对数据总线,使用蚁群算法进行指令调度,用0-1翻转编码,有效减少了其总线翻转,降低了功耗。这种基于数据总线和地址总线的优化算法,能够在特定的实验平台下通过实验验证,算法对于总线功耗的优化效率大约为25%左右。 相似文献
7.
为了解决嵌入式系统设备总线的功耗问题,从软件方面的功耗优化入手,提出一种面向嵌入式系统总线的低功耗优化方法,即在编译阶段,分别对指令地址总线和数据总线进行优化,以减少总线的翻转次数,降低其功耗。具体方法为:针对指令地址总线,采用改进后的遗传算法进行函数段调用优化,然后结合T0编码,减少总线翻转次数,从而降低其功耗。针对指令数据总线,采用粒子群算法进行指令调度优化,然后结合0-1翻转编码,减少总线翻转次数,从而降低其功耗。为了验证上述方法的正确性和有效性,以HR6P系列微处理器为平台展开实验,实验结果表明,总线功耗的优化效率达到25%左右。该方法明显减少了总线的翻转次数,提高了系统的整体性能。 相似文献
8.
密码专用处理器常采用分簇式超长指令字(Very Long Instruction Word, VLIW)架构,其性能的发挥依赖于编译器的实现.当前对于通用VLIW架构的编译后端优化方案,在密码专用处理器上都有一定的不适应性.为此,本文提出了一种面向密码专用处理器的、同时进行簇指派、指令调度和寄存器分配的编译器后端优化方法.构造“定值-引用”链,求解变量的候选寄存器类型集合交集,确定其寄存器类型;实时评估可用资源,进行基于优先级的指令选择和基于平衡寄存器压力的簇指派;改进线性扫描算法,基于变量的“待引用次数”列表进行实时的寄存器分配.实验结果表明,本方法能够提升生成代码的性能,且算法是非启发式的,减小了编译所需的时间. 相似文献
9.
程序中大量分支指令的存在,严重制约了体系结构和编译器开发并行性的能力。有效发掘指令级并行性的一个主要挑战是要克服分支指令带来的限制。利用谓词执行可有效地删除分支,将分支指令转换为谓词代码,从而扩大了指令调度的范围并且删除了分支误测带来的性能损失。阐述了基于谓词代码的指令调度、软件流水、寄存器分配、指令归并等编译优化技术。设计并实现了一个基于谓词代码的指令调度算法。实验表明,对谓词代码进行编译优化,能有效提高指令并行度,缩短代码执行时间,提高程序性能。 相似文献
10.
11.
Traditionally, code scheduling is used to optimize the performance of an application, because it can rearrange the code to allow the execution of independent instructions in parallel based on instruction level parallelism (ILP). According to our observations, it can also be applied to reduce power dissipation by taking advantage of the properties of existing low-power techniques. In this paper, we present a power-aware code scheduling (PACS), which is a code scheduling integrated with power gating (PG) and dynamic voltage scaling (DVS) to reduce power consumption while executing an application. In other words, from the viewpoint of compilation optimization, PG and DVS can be applied simultaneously to a code and their impact can be enhanced by code scheduling to further save power. The result shows that when compared with hardware power gating, the proposed PACS can outperform by more than 33% and 41% in terms of energy delay product and energy delay2 product for DSPStone and Mediabench. 相似文献
12.
Dynamic and transparent binary translation 总被引:1,自引:0,他引:1
High-frequency design and instruction-level parallelism (ILP) are important for high-performance microprocessor implementations. The Binary-translation Optimized Architecture (BOA), an implementation of the IBM PowerPC family, combines binary translation with dynamic optimization. The authors use these techniques to simplify the hardware by bridging a semantic gap between the PowerPC's reduced instruction set and even simpler hardware primitives. Processors like the Pentium Pro and Power4 have tried to achieve high frequency and ILP by implementing a cracking scheme in hardware: an instruction decoder in the pipeline generates multiple micro-operations that can then be scheduled out of order. BOA relies on an alternative software approach to decompose complex operations and to generate schedules, and thus offers significant advantages over purely static compilation approaches. This article explains BOA's translation strategy, detailing system issues and architecture implementation 相似文献
13.
Variable assignment and instruction scheduling for processor with multi-module memory 总被引:1,自引:0,他引:1
Lei ZhangAuthor VitaeMeikang QiuAuthor Vitae Edwin H.-M. ShaAuthor Vitae Qingfeng ZhugeAuthor Vitae 《Microprocessors and Microsystems》2011,35(3):308-317
Multi-module memory has been employed in high-end digital signal processing system (DSP). It provides high memory bandwidth and low power operating mode for energy savings. However, making full use of these architectural features is a challenging problem for code optimization. In this paper, we propose an integer linear programming (ILP) model to optimize the performance and energy consumption of multi-module memories by solving variable assignment, instruction scheduling and operating mode setting problems simultaneously. The combined effect of performance and energy saving requirements has been considered as well. Specially, we develop two optimization techniques to improve the computation efficiency of our ILP model. The experimental results show that the optimal performance and energy solution can be achieved within a reasonable amount of time. 相似文献
14.
15.
Haijun Sun Zhibiao Shao Songtao Wang Gang Zou 《通讯和计算机》2005,2(5):41-45
A set of innovative optimization techniques are introduced in the 32.bit floating point RISC microprocessor for high performance operation, which include modified redundant Booth-3 algorithm for fast multiplication or division, dynamic SRAM mode control scheme for low power dissipation, embedded bus preselector improving the performance of bus interface, and the large capacity on-chip memory decreasing the amount of traffic with an external memory. The processor has been verified, each instruction and its random combinations run correctly, and the chip achieves 0.98 mA/MHz at 3.3V power supply, The chip testing shows that optimization techniques have improved the speed and quality of the processor with 38%, boosted frequency and 39% reduced oower dissipation. 相似文献
16.
17.
18.
Genggeng LIU Wenzhong GUO Rongrong LI Yuzhen NIU Guolong CHEN 《Frontiers of Computer Science》2015,9(4):576-594
This paper presents a high-quality very large scale integration (VLSI) global router in X-architecture, called XGRouter, that heavily relies on integer linear programming (ILP) techniques, partition strategy and particle swarm optimization (PSO). A new ILP formulation, which can achieve more uniform routing solution than other formulations and can be effectively solved by the proposed PSO is proposed. To effectively use the new ILP formulation, a partition strategy that decomposes a large-sized problem into some small-sized sub-problems is adopted and the routing region is extended progressively from the most congested region. In the post-processing stage of XGRouter, maze routing based on new routing edge cost is designed to further optimize the total wire length and mantain the congestion uniformity. To our best knowledge, XGRouter is the first work to use a concurrent algorithm to solve the global routing problem in X-architecture. Experimental results show that XGRouter can produce solutions of higher quality than other global routers. And, like several state-of-the-art global routers, XGRouter has no overflow. 相似文献