共查询到20条相似文献,搜索用时 78 毫秒
1.
通用处理器的寄存器分配一般采用图着色的方法.除非考虑特例,优化的图着色是NP完全性问题.因此,传统寄存器分配常利用图着色的启发式算法,并能对规则的RISC处理器生成质量较高的代码.但由于嵌入式处理器不规则的体系结构特征,这种传统寄存器分配方法生成的代码质量不能满足嵌入式领域的要求.本文提出了一种新的遗传算法和局部搜索相混合的元启发式方法,能较好地克服传统寄存器分配的不足.实验结果表明,这种新的算法比传统图着色寄存器分配算法减少约30%spill代码. 相似文献
2.
BWDSP是一款自主设计的国产VLIW(超长指令字)数字信号处理器,支持SIMD技术,其SIMD指令可以在4个宏上同时执行4个32位计算,对寄存器使用有特殊规则,Open64编译器的寄存器分配策略并不适用于这种规则.本文对BWDSP SIMD指令的寄存器分配优化技术进行了研究,并在BWDSP的编译器OCC上得以实现. 相似文献
3.
4.
寄存器分配技术是编译器最为关键的优化技术之一.反馈式编译优化是一种基于程序当前和以前运行时的趋势来改变程序以后执行动作的技术,它能够提供给寄存器分配一些有用的优化信息.在分析Open64编译器反馈式编译优化技术的基础上,基于ALPHA结构实现和扩展了反馈式编译优化在寄存器分配中的应用,获得了较好的优化性能. 相似文献
5.
6.
SpMV的自动性能优化实现技术及其应用研究 总被引:1,自引:0,他引:1
在科学计算中,稀疏矩阵向量乘(SpMV)是一个十分重要且经常被大量调用的计算内核.由于SpMV一般实现算法的浮点计算和存储访问次数比率非常低,且其存储访问模式极为不规则,其实际运行性能往往很低.通过采用寄存器分块算法和启发式分块大小选择算法,将稀疏矩阵分成小的稠密分块,重用保存在寄存器中向量x元素,可以提高该计算内核的性能.剖析和总结了OSKI软件包所采用的若干关键优化技术,并进行了实际应用性能测试.测试表明,在实际应用这些优化技术的过程中,应用程序对SpMV的调用次数要达到上百次的量级,才能抵消由于应用这些性能优化技术所带来的额外时间开销,取得性能加速效果.在Pentium 4和AMD Athlon平台上,测试了10个矩阵,其平均加速比分别达到了1.69和1.48. 相似文献
7.
本文基于斯坦福大学设计的KernelC编译器ISCD,针对64位流处理器体系结构,设计实现了其核心VLIW编译器,并针对高性能计算应用需求进行优化,实现了分布式寄存器负载均衡和指令自动合并技术。实验结果表明,该编译器能够很好地开发程序中的并行性,具有较高的效率。 相似文献
8.
9.
10.
11.
David R. Hanson 《Software》1983,13(8):745-763
Program optimization has received a great deal of attention for many years, which has resulted in numerous advances in compiler technology. The effectiveness of various simple optimizations has received comparably little attention during the same time period. The simplicity of most programs suggests that straightforward optimizations pay the greatest dividends. This paper describes three such optimizations suitable for one-pass compilers. The optimizations involve expression rearrangement, instruction selection, and the use of a cache for the allocation of resources. The cost of these optimizations is low; none require major changes to the size or structure of the compiler or reduce compilation speed by more than 10%. The benefits are high; each optimization results in at least a 10% average reduction in object code size and a corresponding reduction in execution time. Examples and implementation details are also described. 相似文献
12.
The explosive growth in network bandwidth and Internet services such as QoS (quality of service) and SLA (service level agreement) monitoring have created the need for new networking hardware called a Network Processing Unit (NPU). In order to rapidly reconfigure the NPU for frequently varying Internet services and technologies, a high-performance C compiler is urgently needed. Several code generation techniques, which are intended to meet the high code quality demands of other types of application specific instruction-set processors (ASIPs) like digital signal processors (DSPs), have already been developed. However, these techniques are insufficient for NPUs due to striking architectural differences such as asymmetric data paths. The main purpose of this paper is to discuss our recent experience with the development of a commercial compiler for a new NPU called the Paion PPII, which is basically a packet engine for NPU to meet the growing need for new high-bandwidth communication equipment targeted for Internet routers and ethernet adapters. For this purpose, we will first show the architectural challenges posed by the target NPU. Then, we will describe several compiler techniques that we found to be effective for the target NPU with various unorthogonal architectural features. The current implementations of the PPII use a VLIW (Very Long Instruction Word) architecture. So, we handled this VLIW-style architecture by employing a simple code compaction scheme which packs multiple parallel instructions into one long instruction word. The experimental results show that our techniques are effective for significantly reducing the dynamic instruction count. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献
13.
Optimizing compilers increase the resulting code performance by carrying out a number of code optimization techniques. Profile information assistance for code optimizations gives an opportunity to greatly increase the code performance in some cases. However, the impossibility to provide a representative training execution often leads to the decline in efficiency of profile-dependent code optimizations. This paper investigates the main causes of the performance loss for the one-stage optimization as compared to the profileguided optimization (PGO) and introduces some alternative compilation techniques to reduce this loss. The effectiveness of these techniques is evaluated for a VLIW-architecture Elbrus compiler. 相似文献
14.
15.
Instruction-level parallel processing: History,overview, and perspective 总被引:11,自引:0,他引:11
Instruction-level parallelism (ILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP had become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades. 相似文献
16.
17.
18.
Most of the current cloud computing providers allocate virtual machine instances to their users through fixed-price allocation mechanisms. We argue that combinatorial auction-based allocation mechanisms are especially efficient over the fixed-price mechanisms since the virtual machine instances are assigned to users having the highest valuation. We formulate the problem of virtual machine allocation in clouds as a combinatorial auction problem and propose two mechanisms to solve it. The proposed mechanisms are extensions of two existing combinatorial auction mechanisms. We perform extensive simulation experiments to compare the two proposed combinatorial auction-based mechanisms with the currently used fixed-price allocation mechanism. Our experiments reveal that the combinatorial auction-based mechanisms can significantly improve the allocation efficiency while generating higher revenue for the cloud providers. 相似文献
19.
税控收款机嵌入式系统的设计与实现 总被引:2,自引:0,他引:2
嵌入式系统的应用领域越来越广泛。嵌入式系统的开发首先要选取合适的微处理器与操作系统;文章根据税控收款机对软硬件的要求,选取S3C44B0X微处理器与NucleusPLUS操作系统,介绍了税控收款机系统的设计与实现。 相似文献
20.
提出了很多结合技术使得指令调度与寄存器分配之间进行一些信息交互,在没有引入过多溢出代码的情况下提高了指令级并行度,从而提高了性能。按照算法的特征分类介绍了几种影响力较大的算法,同时作了简单的评价和效果比较,最后介绍了有关指令调度和寄存器分配结合的一些新方向。 相似文献