首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A 1.3-GHz fifth-generation SPARC64 microprocessor   总被引:1,自引:0,他引:1  
A fifth-generation SPARC64 processor is fabricated in 130-nm partially depleted silicon-on-insulator CMOS with eight layers of Cu metallization. At V/sub dd/ = 1.2 V and T/sub a/ = 25/spl deg/C, it runs at 1.3 GHz and dissipates 34.7 W. The chip contains 191 M transistors with 19 M logic circuits in an area of 18.14 mm /spl times/ 15.99 mm and is covered with 5858 bumps, of which 269 are for I/O signals. It is mounted in a 1360-pin land-grid-array package. The 16-byte-wide system bus operates with a 260-MHz clock in single-data-rate or double-data-rate modes. This processor implements an error-detection mechanism for execution units and data path logic circuits in addition to on-chip arrays to detect data corruption. Intermittent errors detected in execution units and data paths are recovered via instruction retry. A soft barrier clocking scheme allows amortization of the clock skew and jitter over multiple cycles and helps to achieve high clock frequency. Tunability of the clock timing makes timing closure easier. A relatively small amount of custom circuit design and the use of mostly static circuits contributes to achieve short development time.  相似文献   

2.
This paper describes the design of a system bus interface for the 130-nm Itanium/sup /spl reg//2 processor that operates at 400MT/s (1 megatransfer = 1 Mb/s/pin) with a peak bandwidth of 6.4 GB/s. The high-speed operation is achieved by employing source-synchronous transfer with differential strobes. Short flight time is accomplished by double-sided placement of the processors. Preboost and postboost edge-rate control enables fast clock-to-output timing with tight edge-rate range. The built-in input/output (I/O) loopback test feature enables I/O parameters to be tested on die, using a delay-locked loop and interpolator with 21-ps phase-skew error and 15-ps rms jitter. Power modeling methodology facilitates accurate prediction of system performance.  相似文献   

3.
A 2-/spl mu/m CMOS VLSI digital signal processor (DSP) family, the SP50, is described that is capable of eight million instructions per second and up to six concurrent operations in each instruction. Two DSPs, the PCB5010 and PCB5011, have been developed. Both are based on a common architecture which contains two 16-bit data buses, and a 16/spl times/16/spl rarr/40-bit multiplier accumulator and 16-bit ALU, both with multiprecision support in hardware. Also implemented are two static data RAMs (128/spl times/16 or 256/spl times/16), a data ROM (51/spl times/16), a 15-word three-port register file, three address computation units, and five serial and parallel I/O interfaces. The data path is controlled by an orthogonal instruction set, using 40-bit microcode words. The controller contains a five-level stack and an instruction repeat register, and can have either on-chip program memory (RAM: 32/spl times/40; ROM: 987/spl times/40) or off-chip program memory (up to 64K/spl times/40). Benchmarks show a two to sixfold improvement in overall performance over its predecessors.  相似文献   

4.
The design, layout, and testing of a 5 Mb/s digital multiplexer using 2-/spl mu/m design rules for fiber-optic applications are described. Using a common 10-bit bus, the chip reads data from 16 sources in response to a DATA READY signal. Serial output includes a parity bit and is sent to an LED driver. Handshaking, sequential control, parity checking, and data formatting are covered.  相似文献   

5.
A single-chip 80-bit floating point VLSI processor capable of performing 5.6 million floating point operations per second has been realized using 1.2-/spl mu/m n-well CMOS technology. The processor handles 80-bit double-extended floating point data conforming to IEEE standard 754. The chip has 128 microinstructions which are stored in an on-chip ROM. By programming microinstruction sequences in an external control storage, not only basic arithmetic operation but also special arithmetic functions can be performed. A composite design method supported by a hierarchical design automation system was used to quickly lay out 50K gates including a 64-/spl times/64-bit multiplier and 15 kb of memory on a chip with a die size of 10/spl times/10 mm/SUP 2/. Only 11 man-months were required for the effort.  相似文献   

6.
The Pentium/spl reg/ 4 processor architecture uses a 2/spl times/ frequency core clock to implement low latency integer operations. Low-voltage-swing (LVS) logic circuits implemented in 90-nm technology meet the frequency demands of a third-generation integer-core design.  相似文献   

7.
This paper describes the design, realization, and evaluation of a mixed-signal motion estimation processor using the full-search block-matching algorithm. The approach features digital I/O and a low-power, compact analog computational core. The proof-of-concept realization whose architecture incorporates pixel reuse, was fabricated in 0.8-/spl mu/m CMOS technology occupying 0.65 mm/sup 2/, and operates on 4 /spl times/ 4 pixel blocks and a search area of 8 /spl times/ 8 pixels. The processor achieves a low energy consumption per motion vector of 1.35 nJ and dissipates 0.8 mW from a 3-V power supply at QCIF 15 frames/s. The approach is intended for portable applications of digital video encoding.  相似文献   

8.
A 16-bit LSI minicomputer, using n-channel MOS technology, has been developed. The instruction set contains 126 instructions including floating-point arithmetic and is fully compatible with commercially available minicomputers such as the TOSBAC-40 and the Interdata 70. An execution speed of 2 /spl mu/s is obtained for register to register (RR) instructions. All the central processing unit (CPU) functions are implemented on a single board. An external microprogram ROM and short-single address microinstructions are used to realize high-system performance and reduce the chip area and the package pin numbers. Two LSI chips for the system, a single-chip processor, and a bit-sliced bus controller, are fabricated by a new n-channel MOS technology named the gate oxidation method (GOM) which provides a high-packing density, high speed, and a simplified process.  相似文献   

9.
A single-chip CMOS optical microspectrometer containing an array of 16 addressable Fabry-Perot etalons (each one with a different resonance cavity length), photodetectors, and circuits for readout, multiplexing, and driving a serial bus interface has been fabricated in a standard 1.6 /spl mu/m CMOS technology (chip area 3.9 /spl times/ 4.2 mm/sup 2/). The result is a chip that can operate using only four external connections (including V/sub dd/ and V/sub ss/) covering the optical range of 380-500 nm with full-width half-maximum (FWHM) = 18 nm. Frequency output and serial bus interface allow easy multisensor and multichip interfacing using a microcontroller or a personal computer. Power consumption is 1250 /spl mu/W for a clock frequency of 1 MHz.  相似文献   

10.
This paper describes an adaptive bandwidth bus (ABB) architecture based on hybrid current/voltage mode repeaters for long global RC interconnect static busses that achieves high-data rates while minimizing the static power dissipation associated with current-mode signaling. Attaining a maximum aggregate bandwidth of 16 Gb/s (i.e., 1 Gb/s per line) across lossy on-chip interconnects spanning 1.75 cm in length, the bus core fabricated in 0.35 /spl mu/m CMOS technology dissipates approximately 93 mW with a supply of 2.5 V and signal activity of 0.5, equivalent to 5.71 pJ/bit. Experimental results using a 16-bit reference bus design that can be externally programmed to operate in voltage, current or adaptive modes indicate a 50% reduction in power dissipation over current-mode (CM) sensing, and an improvement in interconnection delay and signaling bandwidth of 35%-70% and 66% over voltage-mode (VM) sensing, respectively.  相似文献   

11.
The circuit and the design of an experimental 16-bit peripheral processor are described. The circuit is used in controller applications between mass storage memories and the CPU of mainframes. The chip is fabricated in 2-/spl mu/m NMOS technology using polycide and 2 metal layers. This component (300000 transistors, 105 mm/SUP 2/, 152 pins) handles data rates up to 5 Mb/s and has a power dissipation of about 2 W. A highly modular and regular design and some automatically generated layouts resulted in a short design time. The top-down design required intensive floorplanning. Outstanding features are the function slice microinstruction decoding scheme and a large on-chip microprogram RAM with 36 kbits.  相似文献   

12.
The load/store pipe for a low-power 1-GHz embedded processor is described. For area savings and logic complexity reduction, the load/store pipe is clocked at twice the frequency of the processor core. It can sustain two load or store operations per core clock cycle with zero load to use issue latency. The address generation unit for one of the two load/store pipes takes advantage of the common addressing mode in MIPS 64 ISA to generate the address within a core clock phase. Phase borrowing is employed in the translation lookaside buffer (TLB) design to enable a lookup process within a core clock phase. The data cache design enables the activation of a minimum number of data bank arrays for power savings. Small-swing differential buses are used for multiple address and data buses for improved signal transmission latency. The quadrature clocks used to derive the 2/spl times/ clock are generated with a novel 4-to-1 divider and distributed with matched paths, all to reduce the duty cycle variation of the 2/spl times/ clock phase. The design has been implemented in a 0.13-/spl mu/m CMOS process.  相似文献   

13.
A 256-bit/spl times/4-bit static RAM working on a supply voltage down to 1.2 V is described. A serial interface for the address and the data with a 4-bit bus reduces the pincount of the RAM to only 8. Special design techniques to reach the design goal-very low power at a reasonable circuit speed-are discussed in detail. The device is fabricated in a low power silicon gate CMOS process. An operating power of 500 /spl mu/W/MHz and a standby power of less than 1 /spl mu/W at 1.5 V supply voltage was achieved. With this serial interface a cycle time of 1 /spl mu/s at 1.5 V was measured.  相似文献   

14.
A double word-line memory ROM (DWM-ROM) for use in gate arrays is described. It allows for an automatic layout by reducing the input pin count in the word lines by using two-step addressing. The advantage of this method has been verified by implementing a 16-bit microprocessor using an 8 K-gate array, based on a gate-isolated cell configuration, employing 1.5-/spl mu/m double-metal CMOS technology. The 16-bit /spl times/ 64-word ROM in the processor saves 30% of the transistor area due to the DWM-ROM.  相似文献   

15.
A double/single-precision floating-point processor using a titanium disilicide 3.5-/spl mu/m NMOS process achieves double-precision add/subtract, multiply, and divide in 2, 8, and 16 /spl mu/s respectively. The chip has about 35K devices and is about 400 mil on the side. The chip uses a single 5-V supply with TTL-compatible levels on all signals except for the clocks, which require 4.5 V for a logic high. Four input clocks are used to generate eight 50-ns intervals. A -2.5 V substrate bias generator is designed on the chip but uses a pin for an external capacitor. The processor, which is to be used in a desktop implementation of a minicomputer, executes the floating-point instruction set for the micro-Eclipse computer.  相似文献   

16.
A 24-bit microprogrammed processor with 200 ns instruction cycle time has been realized as an experimental special purpose VLSI chip. The design was based on a general cell library and a set of advanced CAD tools. The technology used is a 3 /spl mu/m silicon gate, n-channel, single metallization MYMOS process. The chip integrates 9400 gate functions plus a 256/spl times/27 bit static RAM on 78.5 mm/SUP 2/.  相似文献   

17.
A 600-MHz single-chip multiprocessor, which includes two M32R 32-bit CPU cores , a 512-kB shared SRAM and an internal shared pipelined bus, was fabricated using a 0.15-/spl mu/m CMOS process for embedded systems. This multiprocessor is based on symmetric multiprocessing (SMP), and supports modified-exclusive-shared-invalid (MESI) cache coherency protocol. The multiprocessor inherits the advantages of previously reported single-chip multiprocessors, while its multiprocessor architecture is optimized for use as an embedded processor. The internal shared pipelined bus has a low latency and large bandwidth (4.8 GB/s). These features enhance the performance of the multiprocessor. In addition, the multiprocessor employs various low-power techniques. The multiprocessor dissipates 800 mW in a 1.5-V 600-MHz multiprocessor mode. Standby power dissipation is less than 1.5 mW at 1.5 V. Hence, the multiprocessor achieves higher performance and lower power consumption. This paper presents a single-chip multiprocessor architecture optimized for use as an embedded processor and its various low-power techniques.  相似文献   

18.
A smart-sensor VLSI circuit suitable for focal-plane low-level image processing applications is presented. The architecture of the device is based on a fine-grain software-programmable SIMD processor array. Processing elements, integrated within each pixel of the imager, are implemented utilising a switched-current analog microprocessor concept. This allows the achievement of real-time image processing speeds with high efficiency in terms of silicon area and power dissipation. The prototype 21 /spl times/ 21 vision chip is fabricated in a 0.6 /spl mu/m CMOS technology and achieves a cell size of 98.6 /spl mu/m /spl times/ 98.6 /spl mu/m. It executes over 1.1 giga instructions per second (GIPS) while dissipating under 40 mW of power. The architecture, circuit design and experimental results are presented in this paper.  相似文献   

19.
The floating-point unit (FPU) in the synergistic processor element (SPE) of a CELL processor is a fully pipelined 4-way single-instruction multiple-data (SIMD) unit designed to accelerate media and data streaming with 128-bit operands. It supports 32-bit single-precision floating-point and 16-bit integer operands with two different latencies, six-cycle and seven-cycle, with 11 FO4 delay per stage. The FPU optimizes the performance of critical single-precision multiply-add operations. Since exact rounding, exceptions, and de-norm number handling are not important to multimedia applications, IEEE correctness on the single-precision floating-point numbers is sacrificed for performance and simple design. It employs fine-grained clock gating for power saving. The design has 768K transistors in 1.3 mm/sup 2/, fabricated SOI in 90-nm technology. Correct operations have been observed up to 5.6 GHz with 1.4 V and 56/spl deg/C, delivering 44.8 GFlops. Architecture, logic, circuits, and integration are codesigned to meet the performance, power, and area goals.  相似文献   

20.
We present a tool that starting from high-level specifications of switched-capacitor (SC) /spl Sigma//spl Delta/ modulators calculates optimum specifications for their building blocks and then optimum sizes for the block schematics. At both design levels, optimization is performed using statistical techniques to enable global design and innovative heuristics for increased computer efficiency as compared with conventional statistical optimization. The tool uses an equation-based approach at the modulator level, a simulation-based approach at the cell level, and incorporates an advanced /spl Sigma//spl Delta/ behavioral simulator for monitoring and design space exploration. We include measurements taken from two silicon prototypes: (1) a 16 b @ 16 kHz output rate second-order /spl Sigma//spl Delta/ modulator; and (2) a 17 b @ 40 kHz output rate fourth-order /spl Sigma//spl Delta/ modulator. Both use SC fully differential circuits and were designed using the proposed tool and manufactured in a 1.2 /spl mu/m CMOS double-metal double-poly technology.<>  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号