首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
指令控制流水线是在通用EPIC处理器内部专门为指令控制系统设计的一条与执行流水线相互锁步的流水线,用于携带共用信息和全局控制信息.提出了一种在通用EPIC微处理器设计中采用的指令控制流水线技术,介绍了指令控制流水线的具体设计与实现方法.实际应用表明,指令控制流水线技术能够有效降低EPIC微处理器的设计复杂度.  相似文献   

2.
EPIC(Explicitly Paralld Instruction Computing)显式并行指令计算是当今高性能微处理器技术设计的新理念。本文分析了基于EPIC设计思想的安腾(Itanium)处理器体系结构特点,并介绍了安腾高性能微处理器的应用。  相似文献   

3.
一种精确的分支预测微处理器模型   总被引:3,自引:0,他引:3  
在当今深流水宽发射的微处理器中,为实现高性能,精确的分支预测是不可缺少的关键技术.分支预测失效将浪费大量的时钟周期,无法发挥乱序执行的效能.宽发射微处理器的有效性能同时还依赖指令窗口的大小和指令预取宽度.提出了一种新的更精确的支持分支预测和分支误预测周期损失的微处理器模型.根据指令的执行带宽为指令窗口中可用指令数的平方根统计规律,给出了一个更为精确的描述微处理器取指带宽、分支预测精度、分支误预测周期损失、指令窗口大小和IPC之间关系的算法,并讨论了这些参数的综合权衡以及这些参数对程序IPC的影响.由此可以确定依赖多个微处理器参数的取指带宽阈值和微处理器中几个关键参数的选取.  相似文献   

4.
描述了一种可以有效提高存储级并行(Memory Level Parallelism,MIP)的指令优化锁步执行模型--OLSM(Optimized Lock-Step execution Model)执行模型,并建立了一种能体现OLSM模型思想的层次存储结构.OLSM允许显示并行指令计算(Explicit Parallel Instruction Computmg,EPIC)微处理器实现一定程度的乱序执行,解决了传统超长指令字(Very Long Instruction Word,VLIW)锁步执行的缺陷,可以充分利用结构中的大量计算和存储资源,最大化隐藏存储延迟、提高MLP.  相似文献   

5.
基于FPGA的流水线微处理器设计   总被引:1,自引:0,他引:1  
提高指令级并行度是微处理器体系结构发展的重要方向,也是开发基于FPGA的高性能微处理器的重要内容之一.本文论述了一个基于FPGA的流水线微处理器的指令流水线结构和系统设计,针对在指令流水执行过程中出现的相关问题,提出了相应的检查算法及解决方法.通过一个典型程序对流水线微处理器功能进行仿真,其运行结果表明此微处理器的最大吞吐率为一个时钟周期解释完一条指令,证实了流水线微处理器设计的正确性和高性能.该微处理器的设计在开发未来具有微处理功能的专用集成电路设计方面具有较高的实用价值.  相似文献   

6.
单睿  洪缨  侯朝焕 《计算机学报》2003,26(11):1575-1580
在现代高性能微处理器设计中,推断和推测成为开发指令级并行性ILP(Instruction Level Parallelism)的两种重要技术途径.推断的目的是打破程序间固有的控制流程,将控制相关转变为数据相关,使指令级并行性识别从一个基本块扩大为一个超块.推测执行是为打破分支或访存引起的相关问题而进行的操作,进一步分为控制推测和数据推测.控制推测的目的是打破分支和其他操作间的相关性,进而由编译器在一个超块内识别并行性,减小控制相关的高度.数据推测则是消去访存相关,提高指令级并行度.该文首先对推断和推测本身进行分析,然后在此基础上进一步将推断、推测技术相结合,并应用于高性能媒体处理器的设计中.性能评价和比较结果显示,两种技术相结合将比任何一种技术都更加行之有效.  相似文献   

7.
陈海民  李峥  王瑞蛟 《计算机应用》2011,31(7):2004-2007
针对五级流水线嵌入式微处理器的特定应用环境,对分支预测技术进行了深入研究,提出了一种新的分支预测方案。该方案兼容带缓存设计,通过扩展指令总线,在取指段提前对分支指令跳转方向和目标地址进行预测,保存可能执行而未执行的指令和地址指针以备分支预测失效时得以恢复,减少了预测失效的代价,同时保证了指令流的正确执行。研究表明,该方案硬件开销小,预测效率高,预测失效代价低。  相似文献   

8.
分支预测技术可消除分支指令之后损失的周期,防止流水线断流.高比率的分支预测精确度是高性能微处理器性能的保证.本文详细分析了安腾处理器(Itanium)多级分支预测机制,并研究了每级预测器的具体实现.  相似文献   

9.
分支预测技术可消除分支指令之后损失的周期,防止流水线断流。高比率的分支预测精确度是高性能微处理器性能的保证。本文详细分析了安腾处理器(Itanium)多级分支预测机制,并研究了每级预测器的具体实现。  相似文献   

10.
EDSMT微体系结构研究   总被引:1,自引:0,他引:1  
本文提出了一种多线程微处理器微体系结构EDSMT。EDSMT有效结合显示并行指令计算EPIC和动态同时多线程DSMT技术,通过软、硬件协同的方式充分开发和有效支持多个层次的并行性。EDSMT能够降低硬件设计的复杂性,提高微处理器性能。  相似文献   

11.
田祖伟  孙光 《计算机科学》2010,37(5):130-133
程序中大量分支指令的存在,严重制约了体系结构和编译器开发并行性的能力。有效发掘指令级并行性的一个主要挑战是要克服分支指令带来的限制。利用谓词执行可有效地删除分支,将分支指令转换为谓词代码,从而扩大了指令调度的范围并且删除了分支误测带来的性能损失。阐述了基于谓词代码的指令调度、软件流水、寄存器分配、指令归并等编译优化技术。设计并实现了一个基于谓词代码的指令调度算法。实验表明,对谓词代码进行编译优化,能有效提高指令并行度,缩短代码执行时间,提高程序性能。  相似文献   

12.
The Gmicro/500, which features a RISC-like dual-pipeline structure for high-speed execution of basic instructions and represents a significant advance for the TRON architecture, is presented. Upwardly-object-compatible with earlier members of the Gmicro series, this microprocessor uses resident dedicated branch buffers to greatly enhance branch instruction execution speed. Its microprograms simultaneously use dual execution blocks to execute high-level language instructions effectively. Fabricated with a 0.6-μm CMOS technology on a 10.9-mm×16-mm die, the chip operates at 50/66 MHz and achieves a processing rate of 100/132 MIPS  相似文献   

13.
As microprocessor designs move towards deeper pipelines and support for multiple instruction issue, steps must be taken to alleviate the negative impact of branch operations on processor performance. One approach is to use branch prediction hardware and perform speculative execution of the instructions following an unresolved branch. Another technique is to eliminate certain branch instructions altogether by translating the instructions following a forward branch into predicate form. Both these techniques are employed in many current processor designs. This paper investigates the relationship between branch prediction techniques and branch predication. In particular, we are interested in how using predication to remove a certain class of poorly predicted branches affects the prediction accuracy of the remaining branches. A variety of existing predication models for eliminating branch operations are presented, and the effect that eliminating branches has on branch prediction schemes ranging from simple prediction mechanisms to the newer more sophisticated branch predictors is studied. We also examine the impact of predication on basic block size, and how the two techniques used together affect overall processor performance.  相似文献   

14.
Designers face many choices when planning a new high-performance, general purpose microprocessor. Options include superscalar organization (the ability to dispatch and execute more than one instruction at a time), out-of-order issue of instructions, speculative execution, branch prediction, and cache hierarchy. However, the interaction of multiple microarchitecture features is often counterintuitive, raising questions concerning potential performance benefits and other effects on various workloads. Complex design trade-offs require accurate and timely performance modeling, which in turn requires flexible, efficient environments for exploring microarchitecture processor performance. Workload-driven simulation models are essential for microprocessor design space exploration. A processor model must ideally: capture in sufficient detail those features that are already well defined; make evolving assumptions and approximations in interpreting the desired execution semantics for those features that are not yet well defined; and be validated against the existing specification. These requirements suggest the need for an evolving but reasonably precise specification, so that validating against such a specification provides confidence in the results. Processor model validation normally relies on behavioral timing specifications based on test cases that exercise the microarchitecture. This approach, commonly used in simulation-based functional validation methods, is also useful for performance validation. In this article, we describe a workload driven simulation environment for PowerPC processor microarchitecture performance exploration. We summarize the environment's properties and give examples of its usage  相似文献   

15.
In modern processors, instructions to perform operations are often produced before it becomes known that this is required. Such an expedient, which is called speculative execution, helps to reveal parallelism at the instruction level. In the EPIC architectures, the speculative execution is completely controlled by the compiler, which makes it possible to avoid using complex hardware mechanisms for supporting speculative instruction production. Moreover, the idea of the speculative execution can be used by the compiler in machine-independent optimizations. The paper describes a scheme of construction of the speculative optimization that is based on the selection of properties of the control flow and data flow that are important from the optimization standpoint and on the estimation of the probabilities of their fulfillment. The probabilities found are used for searching and constructing advantageous speculative and bookkeeping transformations. For optimizations that include only speculative movements of instructions upwards along the control flow graph, on the basis of the suggested scheme, a method has been developed that includes algorithms for finding probabilities of data and control dependences, for estimating benefit of speculative movements, and for constructing a recovery code. On the basis of this method, an algorithm for the speculative scheduling of instructions for the Intel Itanium architecture has been developed and implemented. Specific features of its implementation and experimental results are described.  相似文献   

16.
Dynamic optimization relies on runtime profile information to improve the performance of program execution. Traditional profiling techniques incur significant overhead and are not suitable for dynamic optimization. In this paper, a new profiling technique is proposed, that incorporates the strength of both software and hardware to achieve near-zero overhead profiling. The compiler passes profiling requests as a few bits of information in branch instructions to the hardware, and the processor executes profiling operations asynchronously in available free slots or on dedicated hardware. The compiler instrumentation of this technique is implemented using an Itanium research compiler. The result shows that the accurate block profiling incurs very little overhead to the user program in terms of the program scheduling cycles. For example, the average overhead is 0.6% for the SPECint95 benchmarks. The hardware support required for the new profiling is practical. The technique is extended to collect edge profiles for continuous phase transition detection. It is believed that the hardware-software collaborative scheme will enable many profile-driven dynamic optimizations for EPIC processors such as the Itanium processors.  相似文献   

17.
张仕健  胡伟武 《计算机学报》2007,30(10):1674-1680
随着深亚微米工艺的广泛应用,瞬态故障已成为芯片失效的主要原因.文中提出了一种向分支指令后插入冗余指令的容错微结构,利用分支误预测浪费的处理带宽,降低了冗余执行导致的性能损失.实验结果表明,该技术的性能损失在6%~31%之间,平均为21%,明显低于MBI技术而和DIE技术的性能损失相当.该技术能够检测流水线上各阶段发生的瞬态故障并能恢复处理器状态,故障检测延时短,需要的硬件开销也较小,非常适合提高带有简单预测机制的嵌入式微处理器的容错能力.  相似文献   

18.
The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative wrong execution cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses.  相似文献   

19.
反编译技术可以将二进制可执行程序转换为等价的高级语言形式代码,它是软件逆向工程研究的一个重要方向。对机器指令进行语义抽象以产生中间代码表示是反编译程序的一个关键环节。介绍了在反编译过程中通过语义描述由IA-64汇编代码生成更高级的中间表示的实现技术。将语义描述技术与IA-64体系结构的EPIC特性结合起来,有效地解决了EPIC机器指令的语义抽象问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号