首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A 1.3-GHz fifth-generation SPARC64 microprocessor   总被引:1,自引:0,他引:1  
A fifth-generation SPARC64 processor is fabricated in 130-nm partially depleted silicon-on-insulator CMOS with eight layers of Cu metallization. At V/sub dd/ = 1.2 V and T/sub a/ = 25/spl deg/C, it runs at 1.3 GHz and dissipates 34.7 W. The chip contains 191 M transistors with 19 M logic circuits in an area of 18.14 mm /spl times/ 15.99 mm and is covered with 5858 bumps, of which 269 are for I/O signals. It is mounted in a 1360-pin land-grid-array package. The 16-byte-wide system bus operates with a 260-MHz clock in single-data-rate or double-data-rate modes. This processor implements an error-detection mechanism for execution units and data path logic circuits in addition to on-chip arrays to detect data corruption. Intermittent errors detected in execution units and data paths are recovered via instruction retry. A soft barrier clocking scheme allows amortization of the clock skew and jitter over multiple cycles and helps to achieve high clock frequency. Tunability of the clock timing makes timing closure easier. A relatively small amount of custom circuit design and the use of mostly static circuits contributes to achieve short development time.  相似文献   

2.
The HP-PA8000 is a 180-MHz quad-issue custom VLSI implementation of the HP-PA 2.0 64-b architecture delivering 11.84 SPECint95 and 20.18 SPECfp95 with 3.8 million transistors integrated on a 17.68 mm×19.1 mm die in a 3.3-V, 0.5-μm CMOS process. Specialized clock circuits and extensive use of dynamic logic are key factors in this microprocessor's performance. Attention to clock analysis and distribution resulted in a 170 ps clock skew between any two clock nodes. This microprocessor utilizes a 56-entry instruction reorder buffer (IRE), register renaming, and dual functional units to fully exploit instruction level parallelism  相似文献   

3.
A 167 MHz 64 b VLSI CPU chip is described. The chip executes a 333-MFLOPS (peak) with an estimated system performance of 270SPECint92/380SPECfp92 (@167 MHz, 2 MB E-cache). The 17.7×17.8 mm die is fabricated with a 0.5 micron CMOS technology with four metal layers and contains 5.2 M transistors. The superscalar processor is capable of sustaining an execution rate of four instructions per cycle even in the presence of conditional branches and cache misses. Four fully pipelined 8×16 b multipliers and four single-cycle latency 16 b adders combine to speed up image processing, 2-D, 3-D graphics, video compression/decompression by up to an order of magnitude. High clock speed was obtained by the use of delayed reset logic, a new register file design; and novel comparators. Strict design methodology allowed fully functional first silicon which met all speed targets. The power dissipation of the chip is 28 W  相似文献   

4.
This fourth-generation processor combines two enhanced third-generation cores using an advanced 90-nm dual-Vt, dual-gate-oxide technology. Hardware additions feature expanded caches and inclusion of a 2-MB Level-2 cache and a Level-3 tag. Layout was completely redrawn to optimize the design for manufacturability and performance in the latest technology. Special emphasis was placed on library development to improve automation and assist in custom design. The memory design methodologies were completely updated to make quality design simpler and more robust. The chip operates at 1.8 GHz while dissipating 90 W of power at 1.1 V.  相似文献   

5.
A quad-issue custom VLSI microprocessor is described. This microprocessor implements the Alpha architecture and achieves an estimated performance of 13.3 SPECint9S and 18.4 SPECfp95 at 433 MHz. The 9.6 million transistor die measures 14.4 mm×14.5 mm, and is fabricated in a 0.35-μm, four-metal layer CMOS process. This chip dissipates less than 25 W at 433 MHz using a 2.0 V internal power supply. The design was leveraged from a prior 300-MHz, 3.3-V, 0.50-μm CMOS design. It includes several significant architectural enhancements and required circuit solutions for operation at 2.0 V. The chip will operate at nominal internal power supply voltages up to 2.5 V allowing improved performance at the cost of increased power consumption. At 2.5 V, the chip operates at 500 MHz and delivers 15.4 SPECint95 (est) and 21.1 SPECfp95 (est). This paper describes the chip implementation details and the strategy for efficiently migrating the existing design to the 0.35-μm technology  相似文献   

6.
A 400-MIPS/200-MFLOPS (peak) custom 64-b VLSI CPU is described. The chip is fabricated in a 0.75-μm CMOS technology utilizing three levels of metalization and optimized for 3.3-V operation. The die size is 16.8 mm×13.9 mm and contains 1.68 M transistors. The chip includes separate 8-kbyte instruction and data caches and a fully pipelined floating-point unit (FPU) that can handle both IEEE and VAX standard floating-point data types. It is designed to execute two instructions per cycle among scoreboarded integer, floating-point, address, and branch execution units. Power dissipation is 30 W at 200-MHz operation  相似文献   

7.
We report the first implementation of the new SPARC V9 64-b instruction set architecture. The HaL processor called SPARC64 is a ceramic Multi-Chip Module (MCM) that contains one CPU chip, one Memory Management Unit (MMU) chip, and four 64 KB Cache chips. Together, they implement a unique three-level address translation scheme that efficiently supports using virtual addresses spread anywhere in the full 64-b address range. The processor assigns a serial number to each issued instruction to track up to 64 in-progress instructions and can speculatively issue through up to 16 branches. It issues up to 4 instructions per cycle and utilizes superscalar instruction issue, register renaming, and dataflow (potentially out-of-order) execution to fully exploit instruction-level parallelism. The processor maintains a precise-state execution model, and commits in-order, up to 9 instructions in a cycle. In a HaL R1 system, a production SPARC64 running at 143 MHz has a performance of 230 SPECint92 and 300 SPECfp92 and dissipates 50 W from a 3.3 V supply  相似文献   

8.
A 300-MHz 64-b quad-issue CMOS RISC microprocessor   总被引:1,自引:0,他引:1  
This 300 MHz quad-issue custom VLSI implementation of the Alpha architecture delivers 1200 MIPS (peak), 600 MFLOPS (peak), 341 SPECint92, and 512 SPECfp92. The 16.5 mm×18.1 mm die contains 9.3 M transistors and dissipates 50 W at 300 MHz. It is fabricated in a 3.3 V, four-layer metal, 0.5 μm, CMOS process. The upper metal layers (metal-3 and metal-4), primarily used for power, ground, and clock distribution. The chip supports 3.3 V/5.0 V interfaces and is packaged in a 499-pin ceramic IPGA. It contains an 8-kbyte instruction cache; an 8-kbyte, dual-ported, data cache; and a 96-kbyte, unified, second-level, 3-way set associative, fully pipelined, writeback cache. This paper describes the circuit and implementation techniques that were used to attain the 300 MHz operating frequency  相似文献   

9.
This third-generation 1.1-GHz 64-bit UltraSPARC microprocessor provides 1-MB on-chip level-2 cache, 4-Gb/s off chip memory bandwidth, and a new 200 MHz JBus interface that supports one to four processors. The 87.5-million transistor chip is implemented in a seven-layer-metal copper 0.13-/spl mu/m CMOS process and dissipates 53 W at 1.3 V and 1.1 GHz.  相似文献   

10.
A 1-million transistor 64-b microprocessor has been fabricated using 0.8-μm double-metal CMOS technology. A 40-MIPS (million instructions per second) and 20-MFLOPS (million floating-point operations per second) peak performance at 40 MHz is realized by a self-clocked register file and two translation lookaside buffers (TLBs) with word-line transition detection circuits. The processor contains an integer unit based on the SPARC (scalable processor architecture) RISC (reduced instruction set computer) architecture, a floating-point unit (FPU) which executes IEEE-754 single- and double-precision floating-point operations a 6-KB three-way set-associative physical instruction cache, a 2-KB two-way set-associative physical data cache, a memory management unit that has two TLBs, and a bus control unit with an ECC (error-correcting code) circuit  相似文献   

11.
An 80-MFLOPS (peak) 64-b microprocessor that employs superscalar architecture to execute two instructions simultaneously in one 25-ns cycle, including the combination of 64-b floating-point add and multiply instructions, is described. The processor implemented in a 0.8-μm CMOS technology contains 1300 K transistors. The processor also employs a RISC architecture and Harvard-style bus organization. The authors provide an overview of the processor, especially focusing on processor architecture, floating-point hardware, and performance  相似文献   

12.
The first implementation of the IA-64 architecture achieves high performance by using a highly parallel execution core, while maintaining binary compatibility with the IA-32 instruction set. Explicitly parallel instruction computing (EPIC) design maximizes performance through hardware and software synergy. The processor contains 25.4 million transistors and operates at 800 MHz. The chip is fabricated in a 0.18-μm CMOS process with six metal layers and packaged in a 1012-pad organic land grid array using C4 (flip chip) assembly technology. A core speed back-side bus connects the processor to a 4-MB L3 cache  相似文献   

13.
A 550-MHz 64-b PowerPC processor in 0.2-um silicon-on-insulator (SOI) copper technology achieves a 22% frequency gain over a similar design in a CMOS bulk technology. Performance gains are 15%-40% at the circuit level, 24%-28%, for critical paths. Unique SOI design aspects such as history effect, lowered noise margins, parasitic bipolar current, and self-heating are considered  相似文献   

14.
A 32-b RISC/DSP microprocessor with reduced complexity   总被引:2,自引:0,他引:2  
This paper presents a new 32-b reduced instruction set computer/digital signal processor (RISC/DSP) architecture which can be used as a general purpose microprocessor and in parallel as a 16-/32-b fixed-point DSP. This has been achieved by using RISC design principles for the implementation of DSP functionality. A DSP unit operates in parallel to an arithmetic logic unit (ALU)/barrelshifter on the same register set. This architecture provides the fast loop processing, high data throughput, and deterministic program flow absolutely necessary in DSP applications. Besides offering a basis for general purpose and DSP processing, the RISC philosophy offers a higher degree of flexibility for the implementation of DSP algorithms and achieves higher clock frequencies compared to conventional DSP architectures. The integrated DSP unit provides instruction set support for highly specialized DSP algorithms. Subword processing optimized for DSP algorithms has been implemented to provide maximum performance for 16-b data types. While creating a unified base for both application areas, we also minimized transistor count and we reduced complexity by using a short instruction pipeline. A parallelism concept based on a varying number of instruction latency cycles made superscalar instruction execution superfluous  相似文献   

15.
A full-custom single-chip bipolar ECL RISC microprocessor was implemented in a 1.0-μm single-poly bipolar technology. This research prototype contains a CPU and on-chip 2-KB instruction and 2-KB data caches. Worst-case power dissipation with a nominal -5.2 V supply is 115 W. The chip has been designed for a worst-case clock frequency of 275 MHz at a nominal supply. The chip verifies a new style of CAD tools developed during the design process, advanced packaging techniques for high-power microprocessors, and VLSI ECL circuit techniques  相似文献   

16.
A dual-core 64-bit ultraSPARC microprocessor for dense server applications   总被引:1,自引:0,他引:1  
A dual-core 64-bit microprocessor optimized for compute-dense systems such as rack-mount and blade servers for network computing was developed. The chip consists of two UltraSPARC II cores, each with its own 512 kB L2 cache, a DDR-1 memory controller, and symmetric multiprocessor bus (JBus) controllers. The 206-mm/sup 2/ die is fabricated in 0.13-/spl mu/m CMOS technology with seven layers of Cu and a low-k dielectric. The chip offers a highly efficient performance-per-watt ratio with a typical power dissipation of 23 W at 1.3 V and 1.2 GHz. A short design cycle was achieved by leveraging existing designs wherever possible and developing effective design methodologies and flows. Significant design challenges faced by this project are described. These include deep-submicron design issues, such as negative bias temperature instability (NBTI), leakage, coupling noise, intra-die process variation, and electromigration (EM). A second important design challenge was implementing a high-performance L2 cache subsystem with a short four-cycle core-to-L2 latency including ECC.  相似文献   

17.
The authors describe circuit techniques for wide input/output (I/O) data path and high-speed 64-Mb dynamic RAMs (DRAMs). A hierarchical data bus structure using double-level metallization has been developed to form 64-b parallel data bus lines without increasing the chip size. A current-sensing data bus amplifier, developed to sense the 64-b data bus signal in parallel, has made the wide I/O data path structure possible. A direct-sensing type column gate circuit with the READ/WRITE separated select line scheme achieves 40-ns RAS access. A shielded bit-line three-dimensional stacked-capacitor cell with a double-fin storage capacitor stores sufficient charge while the bit-line capacitance shows a reasonable value for sensing the data  相似文献   

18.
This paper describes a 160 MHz 500 mW 32 b StrongARM(R) microprocessor designed for low-power, low-cost applications. The chip implements the ARM(R) V4 instruction set and is bus compatible with earlier implementations. The pin interface runs at 3.3 V but the internal power supplies can vary from 1.5 to 2.2 V, providing various options to balance performance and power dissipation. At 160 MHz internal clock speed with a nominal Vdd of 1.65 V, it delivers 185 Dhrystone 2.1 MIPS while dissipating less than 450 mW. The range of operating points runs from 100 MHz at 1.65 V dissipating less than 300 mW to 200 MHz at 2.0 V for less than 900 mW. An on-chip PLL provides the internal clock based on a 3.68 MHz clock input. The chip contains 2.5 million transistors, 90% of which are in the two 16 kB caches. It is fabricated in a 0.35-μm three-metal CMOS process with 0.35 V thresholds and 0.25 μm effective channel lengths. The chip measures 7.8 mm×6.4 mm and is packaged in a 144-pin plastic thin quad flat pack (TQFP) package  相似文献   

19.
A custom 529 K-transistor microprocessor with a five-stage pipeline has been implemented on a 12.98-mm2 die. Employing BiCMOS macrocells, a 32-b execution unit, extensible ROM, RAM, a PLL (phase-locked loop) clock generator with bipolar drivers, and sense circuits, and a peak performance of 70 MIPS (million instructions per second) are achieved. Power consumption is 2.1 W at 40 MHz  相似文献   

20.
An embedded RISC microprocessor core fabricated in a six-layer metal 0.18-μm CMOS process implementing the ARMTM V.5TE instruction set is described. The core described is the first implementation of the Intel XScale MicroarchitectureTM. The microprocessor core, which includes caches, memory management units, and a bus controller, comprises a hard-embedded block 16.77 mm2 in size. The implementation is primarily custom logic in a variety of circuit styles. The processor dissipates 450 mW at 1.3 V, 600 MHz, and scales between 55 mW at 0.7 V, 200 MHz, and 900 mW at 1.65 V 800 MHz. Architectural performance is 1000 MIPS at 800 MHz with efficiency ranging from over 850 MIPS/W at 1.65 V to over 4500 MIPS/W at 0.75 V. Architectural and circuit design approaches for low power and high performance are described and measured results from the initial implementation are shown. The first implementation VLSI chip has a 3.3-V pin interface and supports a 0.75-1.65-V core voltage range  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号