首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
针对编译器系统设计和编译中的低功耗优化,基于可重定向编译器,实现在编译器后端对VLIW指令总线进行功耗优化的策略.通过对编译生成的二进制目标码进行横向再调度来减少指令总线上的高低电位切换次数,达到降低系统功耗的目的.对编译后端的软件流水和超块调度两种性能优化策略进行对比实验,表明其优化效果在30%以上,并且代码的指令级并行性(Instruction Level Parallelism,ILP)与优化效果存在明显的相关性.最后,通过ILP对该策略提出改进,以指令级并行信息指导功耗优化,在功耗优化效果损失不大的前提下,可节省多达20%的算法开销.  相似文献   

2.
3.
可重构指令集处理器能够适应多变的计算任务在性能和灵活性两方面的要求,而传统的编译后端技术无法为其生成高效的可执行代码,需要有新的代码生成方法.针对传统编译后端代码生成三阶段方法进行扩展的代码混合优化生成算法正是这样一种方法.该算法很大程度地复用了原有的三阶段代码生成过程,同时针对可重构指令集具有动态性的特点,根据系统硬件资源和重构配置,扩展了针对可重构指令代码生成的优化处理,从而能够获得切合可重构指令集处理器体系结构特性的可执行代码.相关实验与分析说明了该算法针对硬件重构得到的新平台所做的可重构指令代码生成是有效的,能够较好地提高应用程序在新平台上的执行性能.  相似文献   

4.
在软硬件协同设计中,常常需要改变嵌入式处理器的体系结构,并评价其对系统各种优化目标的影响,以便产生高效的目标代码。可重定向编译技术正好能满足这一需求。可重定向编译器和传统编译嚣之间的本质区别在于前者要求编译器代码尽可能重用,以便辅助体系结构设计者探索设计空间。本文综述了已有的主要可重定向编译技术,并指出了所遇到的问题和困难。  相似文献   

5.
随着社会信息化水平的不断提高,信息产业的快速发展,由此带来了能源的消耗也越来越高。特别是芯片集成度越来越高,系统应用越来越复杂,这就使得功耗问题成为嵌入式系统必须面对的一个关键问题。单纯的硬件功耗优化已经不能满足要求,基于软件的功耗优化取得了很好的成效。在编译阶段,通过减少总线的翻转次数来降低系统的功耗。针对指令地址总线,结合遗传算法进行函数段的分配,结合相关的编码策略,减少总线翻转,从而降低其功耗。针对数据总线,使用蚁群算法进行指令调度,用0-1翻转编码,有效减少了其总线翻转,降低了功耗。这种基于数据总线和地址总线的优化算法,能够在特定的实验平台下通过实验验证,算法对于总线功耗的优化效率大约为25%左右。  相似文献   

6.
SIMD计算机的优化编译器设计   总被引:1,自引:1,他引:0       下载免费PDF全文
赵辉  黄石 《计算机工程》2009,35(1):201-203
利用处理器的相关资源,提高编译器优化性能和增强代码可适应性是SIMD处理器优化编译的关键。该文基于M语言和LSSIMD体系结构,结合现代编译器的编译技术,提出针对SIMD协处理器编译器的优化和实现方法,包括寄存器分配、单值合并、代码压缩等。实验结果表明,编译生成的目标代码准确、高效。  相似文献   

7.
能耗是设计嵌入式系统不可忽视的一个重要方面.针对嵌入式设备主要能耗来源之一的总线能耗,提出了一种基于总线翻转编码的低功耗指令调度方法.该方法以程序执行频度的profile信息为指导,利用数据随机性增强算法调度指令,获得适应总线翻转编码的指令序列,既减少总线翻转次数,又获得较为平衡的总线使用率,最终达到节约能耗的目的.以MiBench测试用例集为基准进行的对比实验可以看出,该方法能够有效地减少总线翻转次数.相对于未编码优化的arm-linux-gcc的指令序列,平均优化率可达到26%左右.相对于VSI+BI方法,平均优化率也能达到10%以上.  相似文献   

8.
田祖伟  孙光 《计算机科学》2010,37(5):130-133
程序中大量分支指令的存在,严重制约了体系结构和编译器开发并行性的能力。有效发掘指令级并行性的一个主要挑战是要克服分支指令带来的限制。利用谓词执行可有效地删除分支,将分支指令转换为谓词代码,从而扩大了指令调度的范围并且删除了分支误测带来的性能损失。阐述了基于谓词代码的指令调度、软件流水、寄存器分配、指令归并等编译优化技术。设计并实现了一个基于谓词代码的指令调度算法。实验表明,对谓词代码进行编译优化,能有效提高指令并行度,缩短代码执行时间,提高程序性能。  相似文献   

9.
分簇结构超长指令字DSP编译器的设计与实现   总被引:5,自引:0,他引:5  
超长指令字(VLIW)是高端DSP普遍采用的体系结构。VLIW DSP在硬件上没有调度和冲突判决的机制,其性能的发挥完全依靠编译嚣的优化效果.基于可重定向编译基础设施IMPACT,为分簇VLIW DSP YHFT—D4设计与实现了优化编译器.其中着重讨论了可重定向信息的定义、代码注释、SIMD指令的支持、分簇寄存器分配以度指令级并行开发和资源冲突解决等内容.实验结果表明该编译器可以达到较好的优化效果.  相似文献   

10.
分块内存和多地址生成器(AGU)是DSP普遍采用的体系结构.传统的C语言编译器没有针对分块内存和多AGU结构进行代码优化,导致生成代码无法满足性能需求,影响了C语言编译器在数字信号处理领域的应用.为了解决这个问题,提出基于编译指示,与分块内存和多AGU结构相关的编译优化算法.该算法利用定义引用链和引用定义链中的数据流信息,为地址计算指令和访存指令分配AGU,从而提高生成代码的指令级并行度.实验结果表明此算法能够达到较好的优化效果.  相似文献   

11.
Digital signal processors (DSPs) with very long instruction word (VLIW) data‐path architectures are increasingly being deployed on embedded devices for multimedia processing applications. To reduce the power consumption and design cost of VLIW DSP processors, distributed register files and multibank register architectures are being adopted to reduce the number of read and write ports associated with register files, which presents new challenges for devising compiler optimization schemes. This paper addresses the issues of reducing the spill code for a VLIW DSP with distributed register files. Spill code produced by register allocation is traditionally handled by memory spills, but the multibank register‐file architecture provides the opportunity to spill‐out register values onto different register banks. We present a conceptual framework based on the universal and the proxy interference graphs to model the live ranges of registers for spilling codes to different register banks. Heuristic algorithms are then developed on the basis of this concept. By heuristically estimating the register pressure for each register file, we treat different register banks as optional spilling locations in addition to traditional spilling to memory. Experiments were performed on the parallel architecture core VLIW DSP with distributed register files by incorporating our proposed optimization schemes into an Open64‐based compiler. The experimental results show that our approach can improve the performances on average for DSPStone and MiBench benchmarks with spilling cases by 7.1% and 21.6%, respectively, compared with the one always handling spill code in memory. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

12.
In the framework of pipelined vector architecture, the efficiency of vector processing is assessed with respect to plasma MHD codes in nuclear fusion research. By using a vector processor, the FACOM 230-75 APU, the limit of the enhancement factor due to parallelism of current vector machines is examined for three numerical codes based on a fluid model. Reasonable speed-up factors of approximately 6,6 and 4 times faster than the highly optimized scalar version are obtained for ERATO (linear stability code), AEOLUS-R1 (nonlinear stability code) and APOLLO (1-1/2D transport code), respectively. Problems of the pipelined vector processors are discussed from the viewpoint of restructuring, optimization and choice of algorithms. In conclusion, the important concept of “concurrency within pipelined parallelism” is emphasized.  相似文献   

13.
软件流水是一种循环程序的优化技术,它可以有效地提高指令级并行性。由于处理机的实现方法各不相同,在一种处理机上经过软件流水优化后的循环代码很难在其它处理机中移植和使用。反软件流水是软件流水的逆向操作,它可以消除循环代码中的软件流水特性,以便于代码在不同平台上的移植。基于IA-64体系结构,分析了软件流水的代码特点,提出了反流水算法,用于将ICC编译器编译后的可执行二进制代码消除软件流水特性,转换成语义等价的C代码。  相似文献   

14.
针对当前Type-3克隆代码检测工具较少、效率偏低等问题,提出了一种基于Token的能有效检测Type-3克隆代码的检测方法。该方法同时能有效检测Type-1和Type-2克隆代码。首先将源代码Token化得到特定代码粒度的Token串,其次将所有Token串的定长子串进行映射,在对映射信息进行查询的基础上,利用编辑距离算法确定克隆对,然后通过并查集算法快速构建克隆群,最终反馈克隆代码信息。实现了原型工具FClones,利用基于代码突变的框架对工具进行了评价,并与领域内较优秀的两款工具NiCad及SimCad进行了对比。实验结果表明,FClones在检测三类克隆代码时查全率均不低于95%,查准率均不低于98%,能更好地检测Type-3克隆代码。  相似文献   

15.
Dynamic binary translation (DBT) is an important technique in virtualization, and in migrating legacy binaries to platforms based on a new architecture. However, poor profile information limits the process of optimization at runtime, so the DBT system may suffer from substantial overhead. In this paper, we design and implement a static-integrated optimization framework (SINOF) to improve the runtime performance for DBT. Combining static and dynamic approaches can greatly reduce the overhead of optimizing, profiling and translating for any program that runs repeatedly. Under this framework, once the source image has been executed, the profile information and target code will be saved in a software cache, and will be available for future runs. In the static phase, the saved code is analyzed and optimized based on the information collected in the previous run. Especially, we reorganize the code layout of the software cache. Experimental results show that the proposed framework can reduce run time by more than 30% on average compared to the original versions of DBT that the framework is based on.  相似文献   

16.
虚拟哈希页表(VHPT)是高性能微处理器系统实现虚拟地址到物理地址的转换映像,是存储管理的关键技术之一。本文在讨论IA64微处理器地址空间的基础上分析了单地址空间(SAS)和多地址空间(脑)模型的应用需求,研究了长格式、短格式两种页表映射机制,实现了基于这两种格式的64位虚地址空间的哈希地址算法,增强了虚地址转换的
性能。模拟结果表明,该设计与IA64架构兼容。  相似文献   

17.
《Applied Soft Computing》2008,8(1):788-797
This paper proposes two new algorithms based on the clonal selection principle for the design of spreading codes for DS-CDMA. The first algorithm follows a multi-objective approach, generating complex spreading codes with “good” auto as well as cross-correlation properties. It also enables spreading code design with no restrictions on the number of users or code length. The algorithm maintains a repertoire of codes that are subject to cloning and undergo a process of affinity maturation to obtain better codes. Results indicate that the produced code sets lie very close to the theoretical Pareto front. A second penalty function-based constrained optimization algorithm based on clonal selection is proposed. It is applied to the design of spreading codes with pre-defined power spectral density requirement. The results suggest that the algorithm is capable of lowering significantly, the power spectra at undesired frequencies. Therefore, with the proposed algorithm, a DS-CDMA transmitter can, for the first time, selectively transmit power across the transmission bandwidth and adjust to jammers and other interferers. This study illustrates that using two stages of multi-objective and constrained optimization, using the proposed clonal selection algorithms, is an effective code design strategy.  相似文献   

18.
代码生成技术的出现,为满足软件系统中重复性代码的自动生成、保障软件系统的健壮性和可维护性等需要提供了解决方案。目前业界针对Java EE企业应用的代码生成器在系统功能整合方面还存在不足,依赖于程序员基于生成的原型代码进行后续开发。本文提出了一个基于SSH2与权限管理框架Apache Shiro整合的代码生成器方案,有效解决了复杂业务中的多表关联以及权限管理问题,并探讨了实现过程中几个关键技术问题。  相似文献   

19.
Firmware是IA64体系结构的一个重要组成部分。在系统启动时,首先运行Firmware,它对系统的各部件进行初始化、配置和测试,同时还对操作系统的引导进行管理。在大规模并行计算机系统中,节点问的Firmware结构是设计系统时要解决的重要问题。本文提出一种基于IA64的大规模并行系统的Firmware的结构,并对其作了简要分析。  相似文献   

20.
《Computers & Fluids》2006,35(8-9):910-919
This report presents a comprehensive survey of the effect of different data layouts on the single processor performance characteristics for the lattice Boltzmann method both for commodity “off-the-shelf” (COTS) architectures and tailored HPC systems, such as vector computers. We cover modern 64-bit processors ranging from IA32 compatible (Intel Xeon/Nocona, AMD Opteron), superscalar RISC (IBM Power4), IA64 (Intel Itanium 2) to classical vector (NEC SX6+) and novel vector (Cray X1) architectures. Combining different data layouts with architecture dependent optimization strategies we demonstrate that the optimal implementation strongly depends on the architecture used. In particular, the correct choice of the data layout could supersede complex cache-blocking techniques in our kernels. Furthermore our results demonstrate that vector systems can outperform COTS architectures by one order of magnitude.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号