首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Design and implementation details of the MIPS R10000, 200-MHz, 64-b superscalar dynamic issue RISC microprocessor is presented. It fetches and decodes four instructions per cycle and dynamically issues them to five fully pipelined, low latency execution units, Its hierarchical nonblocking memory system helps hide memory latency with two levels of set-associative, write-back caches. The processor has over 6.8 M transistors and is built in 3.3-V, 0.30 μm, four-layer metal CMOS technology with under 30 W of power consumption. The processor delivers peak performance of Spec95int of 9 and Spec95fp of 19 operating at 200 MHz. Clock and power distribution as well as circuit design techniques of several blocks are addressed  相似文献   

2.
This 533-MHz BiCMOS very large scale integration (VLSI) implementation of the PowerPC architecture contains three pipelines and a large on-chip secondary cache to achieve a peak performance of 1600 MIPS. The 15 mm×10 mm die contains 2.7 M transistors (2M CMOS and 0.7 M bipolar) and dissipates less than 85 W. The die is fabricated in a six-level metal, 0.5-μm BiCMOS process and requires 3.6 and 2.1 V power supplies  相似文献   

3.
A quad-issue custom VLSI microprocessor is described. This microprocessor implements the Alpha architecture and achieves an estimated performance of 13.3 SPECint9S and 18.4 SPECfp95 at 433 MHz. The 9.6 million transistor die measures 14.4 mm×14.5 mm, and is fabricated in a 0.35-μm, four-metal layer CMOS process. This chip dissipates less than 25 W at 433 MHz using a 2.0 V internal power supply. The design was leveraged from a prior 300-MHz, 3.3-V, 0.50-μm CMOS design. It includes several significant architectural enhancements and required circuit solutions for operation at 2.0 V. The chip will operate at nominal internal power supply voltages up to 2.5 V allowing improved performance at the cost of increased power consumption. At 2.5 V, the chip operates at 500 MHz and delivers 15.4 SPECint95 (est) and 21.1 SPECfp95 (est). This paper describes the chip implementation details and the strategy for efficiently migrating the existing design to the 0.35-μm technology  相似文献   

4.
A 300-MHz 64-b quad-issue CMOS RISC microprocessor   总被引:1,自引:0,他引:1  
This 300 MHz quad-issue custom VLSI implementation of the Alpha architecture delivers 1200 MIPS (peak), 600 MFLOPS (peak), 341 SPECint92, and 512 SPECfp92. The 16.5 mm×18.1 mm die contains 9.3 M transistors and dissipates 50 W at 300 MHz. It is fabricated in a 3.3 V, four-layer metal, 0.5 μm, CMOS process. The upper metal layers (metal-3 and metal-4), primarily used for power, ground, and clock distribution. The chip supports 3.3 V/5.0 V interfaces and is packaged in a 499-pin ceramic IPGA. It contains an 8-kbyte instruction cache; an 8-kbyte, dual-ported, data cache; and a 96-kbyte, unified, second-level, 3-way set associative, fully pipelined, writeback cache. This paper describes the circuit and implementation techniques that were used to attain the 300 MHz operating frequency  相似文献   

5.
This paper describes circuits used to implement the motion video instructions (MVI) in the 550-MHz Alpha 21 164PC Microprocessor. The chip is fabricated in a 0.35-μm CMOS process and is the first implementation of the MVI instruction set in the Alpha architecture. The MVI instruction set, coupled with the high performance of the 550-MHz processor, delivers 30 frames/s digital video disk (DVD) playback with stereo-quality audio, enables video teleconferencing at 30 frames/s, and significantly improves the performance of MPEG encode algorithms  相似文献   

6.
This paper describes a 160 MHz 500 mW 32 b StrongARM(R) microprocessor designed for low-power, low-cost applications. The chip implements the ARM(R) V4 instruction set and is bus compatible with earlier implementations. The pin interface runs at 3.3 V but the internal power supplies can vary from 1.5 to 2.2 V, providing various options to balance performance and power dissipation. At 160 MHz internal clock speed with a nominal Vdd of 1.65 V, it delivers 185 Dhrystone 2.1 MIPS while dissipating less than 450 mW. The range of operating points runs from 100 MHz at 1.65 V dissipating less than 300 mW to 200 MHz at 2.0 V for less than 900 mW. An on-chip PLL provides the internal clock based on a 3.68 MHz clock input. The chip contains 2.5 million transistors, 90% of which are in the two 16 kB caches. It is fabricated in a 0.35-μm three-metal CMOS process with 0.35 V thresholds and 0.25 μm effective channel lengths. The chip measures 7.8 mm×6.4 mm and is packaged in a 144-pin plastic thin quad flat pack (TQFP) package  相似文献   

7.
This paper describes 3.3-V BiCMOS circuit techniques for a 120-MHz RISC microprocessor. The processor is implemented in a 0.5-μm BiCMOS technology with 4-metal-layer structure. The chip includes a 240 MFLOPS fully pipelined 64-b floating point datapath, a 240-MIPS integer datapath, and 24 KB cache, and contains 2.8 million transistors. The processor executes up to four operations at 120 MHz and dissipates 17 W. Novel BiCMOS circuits, such as a 0.6-ns single-ended common base sense amplifier, a 0.46-ns 22-b comparator, and a 0.7-ns path logic adder are applied to the processor. The processor with the proposed BiCMOS circuits has a 11%-47% shorter delay time advantage over a CMOS microprocessor  相似文献   

8.
This paper describes the performance improvements of a reduced instruction set computer (RISC) microprocessor that has migrated from a 2.5 V technology to a 1.8 V technology. The 1.8 V technology implements copper interconnects and low Vt field-effect transistors in speed-critical paths and has an Leff of 0.12 μm. Global clock latency and skew are improved by using copper wires, and early mode timings are improved by reducing clock skew and adding buffers. These enhancements, along with an environment of 2.0 V, 85°C, and with a fast process, produced a 480-MHz RISC microprocessor  相似文献   

9.
This paper describes a new IA-32 architecture microprocessor that implements 70 additional instructions to further accelerate the performance of data-streaming applications such as three-dimensional graphics and video encode/decode. This processor is an enhancement over the previous implementation of this family through the addition of these new instructions along with circuit improvements in several key areas for higher clock frequency. The 10.17×12.10 mm2 die contains 9.5 million transistors and is fabricated in a CMOS five-layer-metal 0.25-μm process with a six-layer organic land grid array package using C4 interconnect technology. It has an operating range of 1.4-2.2 V and is currently running up to 650 MHz  相似文献   

10.
A macropipelined CISC microprocessor was implemented in a 0.75-μm CMOS 3.3-V technology. The 1.3-million-transistor custom chip measures 1.62×1.46 cm2 and dissipates 16.3 W. The 100-MHz parts were benchmarked at 50 SPEC marks. The on-chip clocking system and several high-performance logic and circuit techniques are described. Macroinstruction handling, micropipeline management, and control store structures highlight the design architecture. The hierarchical array organization and fast tag comparison technique of the primary cache are discussed. Power estimation procedures are outlined, and the results are compared to measurements. Physical design and verification methods, and CAD tools are also described. After extensive functional verification efforts are described, chip and system test results are presented  相似文献   

11.
A superscalar RISC processor contains 2.8 million transistors in a die size of 16.2 mm×16.5 mm, and utilizes 3.3 V/0.5 μm BiCMOS technology. In order to take advantage of superscalar performance without incurring penalties from a slower clock or a longer pipeline, a tag bit is implemented in the instruction cache to indicate dependency between two instructions. A performance gain of up to 37% is obtained with only a 3.5% area overhead from our superscalar design  相似文献   

12.
The HP-PA8000 is a 180-MHz quad-issue custom VLSI implementation of the HP-PA 2.0 64-b architecture delivering 11.84 SPECint95 and 20.18 SPECfp95 with 3.8 million transistors integrated on a 17.68 mm×19.1 mm die in a 3.3-V, 0.5-μm CMOS process. Specialized clock circuits and extensive use of dynamic logic are key factors in this microprocessor's performance. Attention to clock analysis and distribution resulted in a 170 ps clock skew between any two clock nodes. This microprocessor utilizes a 56-entry instruction reorder buffer (IRE), register renaming, and dual functional units to fully exploit instruction level parallelism  相似文献   

13.
A 32-b RISC/DSP microprocessor with reduced complexity   总被引:2,自引:0,他引:2  
This paper presents a new 32-b reduced instruction set computer/digital signal processor (RISC/DSP) architecture which can be used as a general purpose microprocessor and in parallel as a 16-/32-b fixed-point DSP. This has been achieved by using RISC design principles for the implementation of DSP functionality. A DSP unit operates in parallel to an arithmetic logic unit (ALU)/barrelshifter on the same register set. This architecture provides the fast loop processing, high data throughput, and deterministic program flow absolutely necessary in DSP applications. Besides offering a basis for general purpose and DSP processing, the RISC philosophy offers a higher degree of flexibility for the implementation of DSP algorithms and achieves higher clock frequencies compared to conventional DSP architectures. The integrated DSP unit provides instruction set support for highly specialized DSP algorithms. Subword processing optimized for DSP algorithms has been implemented to provide maximum performance for 16-b data types. While creating a unified base for both application areas, we also minimized transistor count and we reduced complexity by using a short instruction pipeline. A parallelism concept based on a varying number of instruction latency cycles made superscalar instruction execution superfluous  相似文献   

14.
A microprocessor implementing IBM S/390 architecture operates in a 10+2 way system at frequencies up to 411 MHz (2.43 ns). The chip is fabricated in a 0.2-μm Leff CMOS technology with five layers of metal and tungsten local interconnect. The chip size is 17.35 mm×17.30 mm with about 7.8 million transistors. The power supply is 2.5 V and measured power dissipation at 300 MHz is 37 W. The microprocessor features two instruction units (IUs), two fixed point units (FXUs), two floating point units (FPUs), a buffer control element (BCE) with a unified 64-KB L1 cache, and a register unit (RU). The microprocessor dispatches one instruction per cycle. The dual-instruction, fixed, and floating point units are used to check each other to increase reliability and not for improved performance. A phase-locked-loop (PLL) provides a processor clock that runs at 2× the system bus frequency. High-frequency operation was achieved through careful static circuit design and timing optimization, along with limited use of dynamic circuits for highly critical functions, and several different clocking/latching strategies for cycle time reduction. Timing-driven synthesis and placement of the control logic provided the maximum flexibility with minimum turnaround time. Extensive use of self-resetting CMOS (SRCMOS) circuits in the on-chip L1 cache provides a 2.0-ns access time and up to 500 MHz operation  相似文献   

15.
A 400-MIPS/200-MFLOPS (peak) custom 64-b VLSI CPU is described. The chip is fabricated in a 0.75-μm CMOS technology utilizing three levels of metalization and optimized for 3.3-V operation. The die size is 16.8 mm×13.9 mm and contains 1.68 M transistors. The chip includes separate 8-kbyte instruction and data caches and a fully pipelined floating-point unit (FPU) that can handle both IEEE and VAX standard floating-point data types. It is designed to execute two instructions per cycle among scoreboarded integer, floating-point, address, and branch execution units. Power dissipation is 30 W at 200-MHz operation  相似文献   

16.
A 28 mW/MHz at 80 MHz structured-custom RISC microprocessor design is described. This 32-b implementation of the PowerPC architecture is fabricated in a 3.3 V, 0.5 μm, 4-level metal CMOS technology, resulting in 1.6 million transistors in a 7.4 mm by 11.5 mm chip size. Dual 8-kilobyte instruction and data caches coupled to a high performance 32/64-b system bus and separate execution units (float, integer, loadstore, and system units) result in peak instruction rates of three instructions per clock cycle. Low-power design techniques are used throughout the entire design, including dynamically powered down execution units. Typical power dissipation is kept under 2.2 W at 80 MHz. Three distinct levels of software-programmable, static, low-power operation-for system power management are offered, resulting in standby power dissipation from 2 mW to 350 mW. CPU to bus clock ratios of 1×, 2×, 3×, and 4× are implemented to allow control of system power while maintaining processor performance. As a result, workstation level performance is packed into a low-power, low-cost design ideal for notebooks and desktop computers  相似文献   

17.
A full-custom single-chip bipolar ECL RISC microprocessor was implemented in a 1.0-μm single-poly bipolar technology. This research prototype contains a CPU and on-chip 2-KB instruction and 2-KB data caches. Worst-case power dissipation with a nominal -5.2 V supply is 115 W. The chip has been designed for a worst-case clock frequency of 275 MHz at a nominal supply. The chip verifies a new style of CAD tools developed during the design process, advanced packaging techniques for high-power microprocessors, and VLSI ECL circuit techniques  相似文献   

18.
A custom 529 K-transistor microprocessor with a five-stage pipeline has been implemented on a 12.98-mm2 die. Employing BiCMOS macrocells, a 32-b execution unit, extensible ROM, RAM, a PLL (phase-locked loop) clock generator with bipolar drivers, and sense circuits, and a peak performance of 70 MIPS (million instructions per second) are achieved. Power consumption is 2.1 W at 40 MHz  相似文献   

19.
A 1-million transistor 64-b microprocessor has been fabricated using 0.8-μm double-metal CMOS technology. A 40-MIPS (million instructions per second) and 20-MFLOPS (million floating-point operations per second) peak performance at 40 MHz is realized by a self-clocked register file and two translation lookaside buffers (TLBs) with word-line transition detection circuits. The processor contains an integer unit based on the SPARC (scalable processor architecture) RISC (reduced instruction set computer) architecture, a floating-point unit (FPU) which executes IEEE-754 single- and double-precision floating-point operations a 6-KB three-way set-associative physical instruction cache, a 2-KB two-way set-associative physical data cache, a memory management unit that has two TLBs, and a bus control unit with an ECC (error-correcting code) circuit  相似文献   

20.
This paper describes a new architecture for embedded reconfigurable computing, based on a very-long instruction word (VLIW) processor enhanced with an additional run-time configurable datapath. The reconfigurable unit is tightly coupled with the processor, featuring an application-specific instruction-set extension. Mapping computation intensive algorithmic portions on the reconfigurable unit allows a more efficient elaboration, thus leading to an improvement in both timing performance and power consumption. A test chip has been implemented in a standard 0.18-/spl mu/m CMOS technology. The test of a signal processing algorithmic benchmark showed speedups ranging from 4.3/spl times/ to 13.5/spl times/ and energy consumption reduced up to 92%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号