首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Designers face many choices when planning a new high-performance, general purpose microprocessor. Options include superscalar organization (the ability to dispatch and execute more than one instruction at a time), out-of-order issue of instructions, speculative execution, branch prediction, and cache hierarchy. However, the interaction of multiple microarchitecture features is often counterintuitive, raising questions concerning potential performance benefits and other effects on various workloads. Complex design trade-offs require accurate and timely performance modeling, which in turn requires flexible, efficient environments for exploring microarchitecture processor performance. Workload-driven simulation models are essential for microprocessor design space exploration. A processor model must ideally: capture in sufficient detail those features that are already well defined; make evolving assumptions and approximations in interpreting the desired execution semantics for those features that are not yet well defined; and be validated against the existing specification. These requirements suggest the need for an evolving but reasonably precise specification, so that validating against such a specification provides confidence in the results. Processor model validation normally relies on behavioral timing specifications based on test cases that exercise the microarchitecture. This approach, commonly used in simulation-based functional validation methods, is also useful for performance validation. In this article, we describe a workload driven simulation environment for PowerPC processor microarchitecture performance exploration. We summarize the environment's properties and give examples of its usage  相似文献   

2.
Historically, advances in integrated circuit technology have driven improvements in processor microarchitecture and led to todays microprocessors with sophisticated pipelines operating at very high clock frequencies. However, performance improvements achievable by high-frequency microprocessors have become seriously limited by main-memory access latencies because main-memory speeds have improved at a much slower pace than microprocessor speeds. Its crucial to deal with this performance disparity, commonly known as the memory wall, to enable future high-frequency microprocessors to achieve their performance potential. To overcome the memory wall, we propose kilo-instruction processors-superscalar processors that can maintain a thousand or more simultaneous in-flight instructions. Doing so means designing key hardware structures so that the processor can satisfy the high resource requirements without significantly decreasing processor efficiency or increasing energy consumption.  相似文献   

3.
Papworth  D.B. 《Micro, IEEE》1996,16(2):8-15
Designing a wholly new microprocessor is difficult and expensive. To justify this effort, a major new microarchitecture must improve performance one and a half or two times over the previous-generation microarchitecture, when evaluated on equivalent process technology. In addition, semiconductor process technology continues to evolve while the processor design is in progress. The previous-generation microarchitecture increases in clock speed and performance due to compactions and conversion to newer technology. A new microarchitecture must “intercept” the process technology to achieve a compounding of process and microarchitectural speedups. This paper looks at a large microprocessor development project which reveals some of the reasoning (for goals, changes, trade-offs, and performance simulation) that lay behind its final form  相似文献   

4.
In this paper we present a design for a general-purpose fuzzy processor, the core of which is based on an analog-numerical approach combining the inherent advantages of analog and digital implementations, above all as regards noise margins. The architectural model proposed was chosen in such a way as to obtain a processor capable of working with a considerable degree of parallelism. The internal structure of the processor is organized as a cascade of pipeline stages which perform parallel execution of the processes into which each inference can be decomposed. A particular feature of the project is the definition of a `fuzzy-gate', which executes elementary fuzzy computations, on which construction of the whole core of the processor is based. Designed using CMOS technology, the core can be integrated into a single chip and can easily be extended. The performance obtainable, in the order of 50 Mega fuzzy rules per second, is of a considerable level  相似文献   

5.
6.
As semiconductor technology advances, more devices can be accommodated in a single VLSI chip. The feasibility of putting multifunctional units in a chip is then worth studying. Such an approach, however, will face software and hardware difficulties (and also tradeoffs). CRISC is a 32-bit single-chip VLSI processor architecture achieving high performance by means of RISC and multiple functional unit approaches. Dual-ALUs are used to execute instructions concurrently for fine-grained parallelism. Up to three instructions can be executed simultaneously by CRISC. Here, CRISC architecture design considerations and instruction cache scheme are investigated. Final microarchitecture and its incorporated software technique to produce object code for fine-grained parallel execution are described; its upper bound performance is estimated by an architectural model. A preliminary evaluation of the CRISC is also conducted, showing most satisfying results.  相似文献   

7.
描述了Intel Pentium4的NetBurst微体系结构,着重分析其内部实现的细节,回顾了Intel处理器的设计,分析了NetBurst微体系结构,再结合其内部功能单元,分析了其指令流水的过程,最后,给出了Pentium4的性能评测。  相似文献   

8.
An experimental 4-bit Josephson processor that demonstrates the possibility of a Josephson computer system with a gigahertz clock is presented. Constructed from 2066 gates on a 5-mm×5-mm die, it implements an eight-instruction set to enable the basic operations of digital signal processing. The basic structure and operating principles of the Josephson device are reviewed, and the design of and operating results for the processor are presented. The focus is mainly on the circuit techniques required to realize a gigahertz clock in a Josephson processor. Circuit techniques to suppress the crosstalk from the AC power to the I/O lines and ways to improve the clock frequency are introduced. The problems remaining in the effort to enhance performance are discussed  相似文献   

9.
Microarchitecture of the Godson-2 Processor   总被引:26,自引:3,他引:23       下载免费PDF全文
The Godson project is the first attempt to design high performance general-purpose microprocessors in China. This paper introduces the microarchitecture of the Godson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that implements the 64-bit MlPS-like instruction set. The adoption of the aggressive out-of-order execution techniques (such as register mapping, branch prediction, and dynamic scheduling) and cache techniques (such as non-blocking cache, load speculation, dynamic memory disambiguation) helps the Godson-2 processor to achieve high performance even at not so high frequency. The Godson-2 processor has been physically implemented on a 6-metal 0.18μm CMOS technology based on the automatic placing and routing flow with the help of some crafted library cells and macros. The area of the chip is 6,700 micrometers by 6,200 micrometers and the clock cycle at typical corner is 2.3ns.  相似文献   

10.
嵌入式系统对处理器功耗开销有严格的限制,异步电路技术可以作为设计低功耗处理器的有效方法之一。针对嵌入式多媒体应用,本文设计实现了一款低功耗异步微处理器——腾越-Ⅱ。处理器中包含一个异步TTA微处理器内核、一个同步TTA微处理器内核、两个存储控制器和多个外部通信接口。异步内核通过基于宏单元的异步电路设计方法实现,其它部分通过基于标准单元的半定制设计流程实现。处理器芯片采用UMC0.18μmCMOS工艺实现,基片面积为4.89×4.89mm2,工作电压为1.8V。经测试,处理器工作主频达到200MHz,且异步内核的功耗开销低于同步内核的50%。  相似文献   

11.
McNairy  C. Soltis  D. 《Micro, IEEE》2003,23(2):44-55
The Itanium 2 processor extends the processing power of the Itanium processor family with a capable and balanced microarchitecture. Executing up to six instructions at a time, it provides both performance and binary compatibility for Itanium-based applications and operating systems.  相似文献   

12.
Simultaneous Multi-Threading (SMT) is a hardware technique that increases processor throughput by issuing instructions simultaneously from multiple threads. However, while SMT can be added to an existing microarchitecture with relatively low overhead, this additional chip area could be used for other resources such as more functional units, larger caches, or better branch predictors. How large is the SMT overhead and at what point does SMT no longer pay off for maximum throughput compared to adding other architecture features? This paper evaluates the silicon overhead of SMT by performing a transistor/interconnect-level analysis of the layout. We discuss microarchitecture issues that impact SMT implementations and show how the Instruction Set Architecture (ISA) and microarchitecture can have a large effect on the SMT overhead and performance. Results show that SMT yields large performance gains with small to moderate area overhead  相似文献   

13.
Processor Design in 3D Die-Stacking Technologies   总被引:4,自引:0,他引:4  
Three-dimensional integration is an emerging fabrication technology that vertically stacks multiple integrated chips. The benefits include an increase in device density; much greater flexibility in routing signals, power, and clock; the ability to integrate disparate technologies; and the potential for new 3D circuit and microarchitecture organizations. This article provides a technical introduction to the technology and its impact on processor design. Although our discussions here primarily focus on high-performance processor design, most of the observations and conclusions apply to other microprocessor market segments.  相似文献   

14.
The authors describe the low-power design of the synergistic processor element (SPE) of the cell processor developed by Sony, Toshiba and IBM. CMOS static gates implement most of the logic, and dynamic circuits are used in critical areas. Tight coupling of the instruction set architecture, microarchitecture, and physical implementation achieves a compact, power-efficient design.  相似文献   

15.
提出集束式整数线性规划形式化模型,利用指令间的功能依赖性解决专用指令集处理器中指令集自动定制的指数性空间问题.在此基础上,针对其前端和后端分别提出了相应的指令定制实现策略.实验结果表明,该指令定制方法可以有效地实现专用指令集的自动设计,并使最终处理器的运算性能得到优化.  相似文献   

16.
为了追求更高的性能,处理器核的主频不断提升,处理器核的设计日益复杂,随之而来的是功耗问题越来越突出。除了在工艺级和电路级采用低功耗技术外,在逻辑设计阶段通过分析处理器核各个功能模块的特点并采用相应的技术手段,也可以有效降低功耗。对一款乱序超标量处理器核中功耗比较突出的模块——寄存器文件和再定序缓冲——进行了逻辑设计优化,在程序运行性能几乎不受影响的情况下明显减少了面积,降低了功耗。  相似文献   

17.
随着处理器的快速发展,RISC-V的软件生态环境建设成为其在处理器市场中站稳脚跟的关键因素之一。二进制翻译是解决处理器二进制代码兼容性问题、为处理器生态环境建设获取时间成本的关键技术之一,但由于二进制翻译器难以以较低的功耗面积开销获得高效执行的二进制代码,使其无法广泛应用于嵌入式领域。针对二进制翻译器执行效率和功耗面积开销难以取得平衡的问题,采用硬件逻辑加速的方式处理ARMv7-M中条件执行指令、更新标志位指令以及桶形移位指令,并利用静态二进制翻译器对ARMv7-M程序进行IT Block分裂、地址重计算及指令映射后生成RISC-V二进制代码,以此支持ARMv7-M的各类指令。基于开源内核CV32E40P设计了一个支持ARMv7-M的处理器内核,结果表明,运行ARMv7-M程序的平均性能能够达到直接运行RISC-V程序性能的137%,与纯软件二进制翻译支持ARMv7-M相比,该处理器核运行ARMv7-M程序的性能提升了5.59倍。  相似文献   

18.
彭元喜  邹佳骏 《计算机应用》2010,30(7):1978-1982
X型DSP是我们自主研发的一款低功耗高性能DSP。对X型DSP的CPU体系结构进行了深入研究,在详细分析X型DSP的ALU部件和移位器部件相关指令基础上,对ALU与移位器部件进行了设计与实现。采用Design Compiler综合工具,基于SMIC公司0.13um CMOS工艺库对ALU移位部件进行了逻辑综合,电路功耗共为4.2821mW,电路面积为71042.9804m2,工作频率达到250MHz。  相似文献   

19.
王秀丽 《软件》2011,32(4):56-58
随着EDA技术及微电子技术的飞速发展,现场可编程门阵列(Field Programmable Gate Array,简称FPGA)的性能有了大幅度的提高。以Nios Ⅱ软核处理器为核心的SOPC(Systemon Programmable Chip)系统便是把嵌入式系统应用在FPGA上的典型例子,本文设计的指纹识别模块就是基于FPGA的Nios Ⅱ处理器为核心的SOPC设计。通过IP核技术和灵活的软硬件编程,实现Nios Ⅱ对FPGA外围器件的控制,利用SOPC Builder将Nios Ⅱ处理器、指纹读取接口UART、键盘与LCD显示接口、FLASH接口、SDRAM控制器构建成Nios Ⅱ硬件系统,后者是电源和时钟电路、SDRAM存储器电路、FLASH存储器电路、LCD显示电路、指纹传感器电路、FPGA配置电路这些纯实物硬件设计。  相似文献   

20.
This paper introduces the microarchitecture and physical implementation of the Godson-2E processor, which is a four-issue superscalar RISC processor that supports the 64-bit MIPS instruction set. The adoption of the aggressive out-of-order execution and memory hierarchy techniques help Godson-2E to achieve high performance. The Godson-2E processor has been physically designed in a 7-metal 90nm CMOS process using the cell-based methodology with some bitsliced manual placement and a number of crafted cells and macros. The processor can be run at 1GHz and achieves a SPEC CPU2000 rate higher than 500.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号