首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
编译器性能是计算机系统架构充分发挥优势的体现,编译器优化受机器平台与编译器特征的影响.编译器分析是在目标编译器与多参照编译器、目标平台与多参照平台之间进行的,即编译器与平台的组合是分析的基础.只有在多组合情况下才能为目标编译器优化提供最大可能的性能提升空间和详细的优化方案,但增加编译器与平台的组合往往会增加无法计量的分...  相似文献   

2.
C语言优化编译器是借助于微机上的Intel80386C语言编译器为研制平台,采用交叉编译的方法设计实现的,它是我国自行设计的第一个从底层开发实现的巨型机C语言编译器。本文首先给出了YH-2C语言优化编译器的设计原理,然后详细介绍了其主要系统组成和技术特点,最后指出了我们以后进一步要做的工作  相似文献   

3.
寄存器分配技术是编译器最为关键的优化技术之一.反馈式编译优化是一种基于程序当前和以前运行时的趋势来改变程序以后执行动作的技术,它能够提供给寄存器分配一些有用的优化信息.在分析Open64编译器反馈式编译优化技术的基础上,基于ALPHA结构实现和扩展了反馈式编译优化在寄存器分配中的应用,获得了较好的优化性能.  相似文献   

4.
由于受限于编译时所见的信息和缺乏精确的输入数据集和目标机信息,编译器为了保持程序正确性和避免性能降级必须做出保守的假设,往往得不到最佳性能。为了克服静态优化的不足,在研究java虚拟机中运行时优化技术的基础上,结合LLVM编译器架构,阐述了面向C/C++程序的运行时优化技术。  相似文献   

5.
现代编译器提供的优化选项众多,选择何种参数因子、选择哪些选项组合以及以何种顺序应用这些选项成为复杂的问题,其中优化次序问题是最困难的优化问题.随着传统方法的改进(迭代编译结合启发式优化搜索)以及新技术的出现(机器学习),构建一种相对高效、智能的编译器自动调优框架成为可能.文中通过调查过去数十年的相关研究,总结了前人的研...  相似文献   

6.
SIMD计算机的优化编译器设计   总被引:1,自引:1,他引:0       下载免费PDF全文
赵辉  黄石 《计算机工程》2009,35(1):201-203
利用处理器的相关资源,提高编译器优化性能和增强代码可适应性是SIMD处理器优化编译的关键。该文基于M语言和LSSIMD体系结构,结合现代编译器的编译技术,提出针对SIMD协处理器编译器的优化和实现方法,包括寄存器分配、单值合并、代码压缩等。实验结果表明,编译生成的目标代码准确、高效。  相似文献   

7.
网络处理器IXP的位操作优化   总被引:1,自引:1,他引:0  
该文介绍一种针对IXP网络处理器指令集的编译器优化技术,该技术将位信息记录引入到传统数据流分析过程当中,通过使用模式匹配技术生成高效的目标代码,试验数据表明利用位操作优化最终生成的指令数可以减少1.1%-3.7%。  相似文献   

8.
基于VLIW的机器相关优化编译技术研究   总被引:2,自引:0,他引:2  
VLIW体系结构性能的发挥在很大程度上依赖于其相应的编译器。编译优化主要包括两个方面:一方面是传统的编译器优化技术;另一方面是针对具体机器平台特定的优化技术。VLIW机器相关的编译优化技术应该针对具体的机器平台,基于超长指令字体系结构的特点,考虑如何充分利用机器提供的硬件资源,以达到软件(编译器)和硬件(CPU)的最大匹配,从而生成高效率高并行度的目标代码。论文从超长指令字的特点出发,探讨了在VLIW体系结构下与机器相关的编译优化的实现方案,同时提出了几点在具体进行与机器相关的优化编译时的关键技术。  相似文献   

9.
数据流分析的关键技术研究   总被引:2,自引:0,他引:2  
数据流分析在编译优化中起着非常关键的作用,尤其是想实现一个具有技术主动权的高性能优化编译器,对数据流分析方法的研究必不可少。本文介绍了数据流分析方法的基本概念和基本原理,介绍了数据流方程的一种解决方法。并结合GCC这个具体的编译器,简要分析了其中数据流分析的具体实现方法。  相似文献   

10.
嵌入式软件优化的认识与实践   总被引:1,自引:0,他引:1  
1自动优化 C编译器是嵌入式系统程序员的基本工具,正是它将程序员的思想和算法转换成处理器可以执行的机器码。所有的C编译器都能够执行各种类型的优化。以gcc编译器为例,除了常见的-O1、-O2、-O3优化选项以外,还可以根据需要打开其他优化开关,它们的含义如表1所列。  相似文献   

11.
Compiler optimizations are difficult to implement and add complexity to a compiler. For this reason, compiler writers are selective about implementing them: they implement only the ones that they believe will be beneficial. To support compiler writers in this, we describe a method for measuring the cost and benefits of compiler optimizations, both individually and in synergy with other optimizations. We demonstrate our method by presenting results for the optimizations implemented in the Jikes Research Virtual Machine on the PowerPC and IA32 platforms. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

12.
VLIW机器在单个机器周期中同时发射并执行多个的并行操作,从而获得较高的指令级并行度,这些操作之间的依赖分析和调度工作则被完全交给相应的编译器执行,因此VLIW的并行性能能否充分发挥取决于VLIW体系结构相关编译器的质量。GNU开发的GCC是被最广泛使用的编译系统之一,它具有多语言、多平台支持的能力和开放的结构,能够运用各种成熟的常规编译优化技术生成高效的代码。文章分析了VLIW及GCC的结构特点,提出了一种基于GCC的VLIW编译系统设计方案,利用GCC进行RTL中间代码一级的体系结构无关优化和少量体系结构相关优化,在汇编代码一级针对VLIW结构进行体系结构相关的优化,从而充分利用GCC的成熟编译技术快速开发高效的VLIW多语言编译系统。  相似文献   

13.
一个基于DAG图的指令调度优化算法   总被引:1,自引:0,他引:1  
指令调度是优化编译技术中一项关键技术,对于VLIW体系结构的CPU,指令调度显得尤为重要。指令调度是在保证语义正确的前提下,改变指令的执行顺序,减少流水线中的空闲周期,从而提高CPU性能的一种优化方法。文章着重分析了优化编译中的指令调度问题,提出了一个指令调度算法和DAG图的一种化简方法,证明了算法的正确性,分析了算法的效率,比较了生成的新指令序列和最优的指令序列总的执行时间的差别。同时,针对目前流行的编译器GCC的指令调度算法中存在的问题,提出了一个较好的解决途径。  相似文献   

14.
Translation validation is a technique for ensuring that a translator, such as a compiler, produces correct results. Because complete verification of the translator itself is often infeasible, translation validation advocates coupling the verification task with the translation task, so that each run of the translator produces verification conditions which, if valid, prove the correctness of the translation. In previous work, the translation validation approach was used to give a framework for proving the correctness of a variety of compiler optimizations, with a recent focus on loop transformations. However, some of these ideas were preliminary and had not been implemented. Additionally, there were examples of common loop transformations which could not be handled by our previous approaches.This paper addresses these issues. We introduce a new rule Reduce for loop reduction transformations, and we generalize our previous rule Validate so that it can handle more transformations involving loops. We then describe how all of this (including some previous theoretical work) is implemented in our compiler validation tool TVOC.  相似文献   

15.
CHAU-WEN TSENG 《Software》1997,27(7):763-796
Fortran D is a version of Fortran enhanced with data decomposition specifications. Case studies illustrate strengths and weaknesses of the prototype Fortran D compiler when compiling linear algebra codes and whole programs. Statement groups, execution conditions, inter-loop communication optimizations, multi-reductions, and array kills for replicated arrays are identified as new compilation issues. On the Intel iPSC/860, the output of the prototype Fortran D compiler approaches the performance of hand-optimized code for parallel computations, but needs improvement for linear algebra and pipelined codes. The Fortran D compiler outperforms the CM Fortran compiler (2.1 beta) by a factor of four or more on the TMC CM-5 when not using vector units. Its performance is comparable to the DEC and IBM HPF compilers on a Alpha cluster and SP-2. Better analysis, run-time support, and flexibility are required for the prototype compiler to be useful for a wider range of programs. © 1997 John Wiley & Sons, Ltd.  相似文献   

16.
Dynamic optimization relies on runtime profile information to improve the performance of program execution. Traditional profiling techniques incur significant overhead and are not suitable for dynamic optimization. In this paper, a new profiling technique is proposed, that incorporates the strength of both software and hardware to achieve near-zero overhead profiling. The compiler passes profiling requests as a few bits of information in branch instructions to the hardware, and the processor executes profiling operations asynchronously in available free slots or on dedicated hardware. The compiler instrumentation of this technique is implemented using an Itanium research compiler. The result shows that the accurate block profiling incurs very little overhead to the user program in terms of the program scheduling cycles. For example, the average overhead is 0.6% for the SPECint95 benchmarks. The hardware support required for the new profiling is practical. The technique is extended to collect edge profiles for continuous phase transition detection. It is believed that the hardware-software collaborative scheme will enable many profile-driven dynamic optimizations for EPIC processors such as the Itanium processors.  相似文献   

17.
This paper describes compiler techniques that can translate standard OpenMP applications into code for distributed computer systems. OpenMP has emerged as an important model and language extension for shared-memory parallel programming. However, despite OpenMP's success on these platforms, it is not currently being used on distributed system. The long-term goal of our project is to quantify the degree to which such a use is possible and develop supporting compiler techniques. Our present compiler techniques translate OpenMP programs into a form suitable for execution on a Software DSM system. We have implemented a compiler that performs this basic translation, and we have studied a number of hand optimizations that improve the baseline performance. Our approach complements related efforts that have proposed language extensions for efficient execution of OpenMP programs on distributed systems. Our results show that, while kernel benchmarks can show high efficiency of OpenMP programs on distributed systems, full applications need careful consideration of shared data access patterns. A naive translation (similar to OpenMP compilers for SMPs) leads to acceptable performance in very few applications only. However, additional optimizations, including access privatization, selective touch, and dynamic scheduling, resulting in 31% average improvement on our benchmarks.  相似文献   

18.
This paper presents new approaches to the validation of loop optimizations that compilers use to obtain the highest performance from modern architectures. Rather than verify the compiler, the approach of translation validationperforms a validation check after every run of the compiler, producing a formal proof that the produced target code is a correct implementation of the source code. As part of an active and ongoing research project on translation validation, we have previously described approaches for validating optimizations that preserve the loop structure of the code and have presented a simulation-based general technique for validating such optimizations. In this paper, for more aggressive optimizations that alter the loop structure of the code—such as distribution, fusion, tiling, and interchange—we present a set of permutation ruleswhich establish that the transformed code satisfies all the implied data dependencies necessary for the validity of the considered transformation. We describe the extensions to our tool voc-64 which are required to validate these structure-modifying optimizations. This paper also discusses preliminary work on run-time validation of speculative loop optimizations. This involves using run-time tests to ensure the correctness of loop optimizations whose correctness cannot be guaranteed at compile time. Unlike compiler validation, run-time validation must not only determine when an optimization has generated incorrect code, but also recover from the optimization without aborting the program or producing an incorrect result. This technique has been applied to several loop optimizations, including loop interchange and loop tiling, and appears to be quite promising. This research was supported in part by NSF grant CCR-0098299, ONR grant N00014-99-1-0131, and the John von Neumann Minerva Center for Verification of Reactive Systems.  相似文献   

19.
It is advantageous to perform compiler optimizations that attempt to lower the worst-case execution time (WCET) of an embedded application since tasks with lower WCETs are easier to schedule and more likely to meet their deadlines. Compiler writers in recent years have used profile information to detect the frequently executed paths in a program and there has been considerable effort to develop compiler optimizations to improve these paths in order to reduce the average-case execution time (ACET). In this paper, we describe an approach to reduce the WCET by adapting and applying optimizations designed for frequent paths to the worst-case (WC) paths in an application. Instead of profiling to find the frequent paths, our WCET path optimization uses feedback from a timing analyzer to detect the WC paths in a function. Since these path-based optimizations may increase code size, the subsequent effects on the WCET due to these optimizations are measured to ensure that the worst-case path optimizations actually improve the WCET before committing to a code size increase. We evaluate these WC path optimizations and present results showing the decrease in WCET versus the increase in code size. A preliminary version of this paper entitled “Improving WCET by optimizing worst-case paths” appeared in the 2005 Real-Time and Embedded Technology and Applications Symposium. Wankang Zhao received his PhD in Computer Science from Florida State University in 2005. He was an associate professor in Nanjin University of Post and Telecommunications. He is currently working for Datamaxx Corporation. William Kreahling received his PhD in Computer Science from Florida State University in 2005. He is currently an assistant professor in the Math and Computer Science department at Western Carolina University. His research interests include compilers, computer architecture and parallel computing. David Whalley received his PhD in CS from the University of Virginia in 1990. He is currently the E.P. Miles professor and chair of the Computer Science department at Florida State University. His research interests include low-level compiler optimizations, tools for supporting the development and maintenance of compilers, program performance evaluation tools, predicting execution time, computer architecture, and embedded systems. Some of the techniques that he developed for new compiler optimizations and diagnostic tools are currently being applied in industrial and academic compilers. His research is currently supported by the National Science Foundation. More information about his background and research can be found on his home page, http://www.cs.fsu.edu/∼whalley. Dr. Whalley is a member of the IEEE Computer Society and the Association for Computing Machinery. Chris Healy earned a PhD in computer science from Florida State University in 1999, and is currently an associate professor of computer science at Furman University. His research interests include static and parametric timing analysis, real-time and embedded systems, compilers and computer architecture. He is committed to research experiences for undergraduate students, and his work has been supported by funding from the National Science Foundation. He is a member of ACM and the IEEE Computer Society. Frank Mueller is an Associate Professor in Computer Science and a member of the Centers for Embedded Systems Research (CESR) and High Performance Simulations (CHiPS) at North Carolina State University. Previously, he held positions at Lawrence Livermore National Laboratory and Humboldt University Berlin, Germany. He received his Ph.D. from Florida State University in 1994. He has published papers in the areas of embedded and real-time systems, compilers and parallel and distributed systems. He is a founding member of the ACM SIGBED board and the steering committee chair of the ACM SIGPLAN LCTES conference. He is a member of the ACM, ACM SIGPLAN, ACM SIGBED and the IEEE Computer Society. He is a recipient of an NSF Career Award.  相似文献   

20.
Matching an application to an architecture in structure and size is a way of achieving higher computation speed. This paper presents a combination of a compiler and a reconfigurable long instruction word (RLIW) architecture as an approach to the matching problem. Configurations suitable for the execution of different parts of a program are determined by a compiler, and code is generated for both reconfiguring the hardware and performing the computation. The RLIW machine, consisting of multiple processing and global data memory modules, effectively utilizes the fine-grained parallelism detected in programs by a compiler. The long word instructions control the operation of processing and memory modules in the system. To reduce the data transfer between processing modules and data memory modules, we provide reconfigurable interconnections among the processing modules which permit direct communication. The compiler uses new techniques, including region scheduling, generation of code for reconfiguration of the system, and memory allocation techniques, to achieve improved performance. Algorithms for packing operations into long word instructions and techniques for effectively assigning memory modules to the operands required by an instruction are developed. Results of the experiments to evaluate the system indicate that speedups of 60–300% can be obtained for both scientific and nonscientific programs. The reconfigurable architecture is responsible for much of the speedup. Also, the results indicate that the major problem of memory bottleneck faced in designing parallel systems is successfully attacked.This paper represents work done while the author was at the University of Pittsburgh  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号