首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This superscalar microprocessor is the first implementation of a 32-bit RISC architecture specification incorporating a single-instruction, multiple-data vector processing engine. Two instructions per cycle plus a branch can be dispatched to two of seven execution units in this microarchitecture designed for high execution performance, high memory bandwidth, and low power for desktop, embedded, and multiprocessing systems. The processor features an enhanced memory subsystem, 128-bit internal data buses for improved bandwidth, and 32-KB eight-way instruction/data caches. The integrated L2 tag and cache controller with a dedicated L2 bus interface supports L2 cache sizes of 512 KB, 1 MB, or 2 MB with two-way set associativity. At 450 MHz, and with a 2-MB L2 cache, this processor is estimated to have a floating-point and integer performance metric of 20 while dissipating only 7 W at 1.8 V. The 10.5 million transistor, 83-mm2 die is fabricated in a 1.8-V, 0.20-μm CMOS process with six layers of copper interconnect  相似文献   

2.
A dual-core 64-bit ultraSPARC microprocessor for dense server applications   总被引:1,自引:0,他引:1  
A dual-core 64-bit microprocessor optimized for compute-dense systems such as rack-mount and blade servers for network computing was developed. The chip consists of two UltraSPARC II cores, each with its own 512 kB L2 cache, a DDR-1 memory controller, and symmetric multiprocessor bus (JBus) controllers. The 206-mm/sup 2/ die is fabricated in 0.13-/spl mu/m CMOS technology with seven layers of Cu and a low-k dielectric. The chip offers a highly efficient performance-per-watt ratio with a typical power dissipation of 23 W at 1.3 V and 1.2 GHz. A short design cycle was achieved by leveraging existing designs wherever possible and developing effective design methodologies and flows. Significant design challenges faced by this project are described. These include deep-submicron design issues, such as negative bias temperature instability (NBTI), leakage, coupling noise, intra-die process variation, and electromigration (EM). A second important design challenge was implementing a high-performance L2 cache subsystem with a short four-cycle core-to-L2 latency including ECC.  相似文献   

3.
A chip set for high-speed radix-2 fast Fourier transform (FFT) applications up to 512 points is described. The chip set comprises a (16+16)/spl times/(12+12)-bit complex number multiplier, and a 16-bit butterfly chip for data reordering, twiddle factor generation, and butterfly arithmetic. The chips have been implemented using a standard cell design methodology on a 2-/spl mu/m bulk CMOS process. Three chips implement a complex FFT butterfly with a throughput of 10 MHz, and are cascadable up to 512 points. The chips feature an offline self-testing capability.  相似文献   

4.
This quad-issue processor achieves 1-GHz operation through improved dynamic circuit techniques in critical paths and a more extensive on-chip memory system which scales in both bandwidth and latency. Critical logic paths use domino, delayed clocked domino, and logic embedded in dynamic flip-flops for minimum delay. A 64-KB sum-addressed memory data cache combines the address offset add with the cache decode, allowing the average memory latency to scale by more than the clock ratio. Memory bandwidth is improved by using wave pipelined SRAM designs for on-chip caches and a write cache for store traffic. Memory power is controlled without increased latency by use of delayed-reset logic decoders. The chip operates at 1000 MHz and dissipates less than 80 W from a 1.6-V supply. It contains 23 million transistors (12 million in RAM cells) on a 244 mm2 die  相似文献   

5.
This third-generation 1.1-GHz 64-bit UltraSPARC microprocessor provides 1-MB on-chip level-2 cache, 4-Gb/s off chip memory bandwidth, and a new 200 MHz JBus interface that supports one to four processors. The 87.5-million transistor chip is implemented in a seven-layer-metal copper 0.13-/spl mu/m CMOS process and dissipates 53 W at 1.3 V and 1.1 GHz.  相似文献   

6.
A 1-V 16-KB (L2) 2-KB (L1) four-way set-associative cache was fabricated using a 0.25-μm CMOS technology for future low-power high-speed microprocessors. Effective latency of 6.9 ns and power consumption of 10 mW at 100 MHz are obtained at a supply voltage of 1 V. This performance is achieved by using a new separated bit-line memory hierarchy architecture (SBMHA) that speeds up latency and reduces power consumption, and domino tag comparators (DTC's) that reduce the power dissipation of tag comparisons  相似文献   

7.
A 4K/spl times/8 MOS dynamic RAM using a single transistor cell with on-chip self-refresh is described. The device uses a multiplexed address/data bus. Control of the reconfigurable data bus allows the RAM to operate on either an 8-bit or a 16-bit data bus. The memory cell is fabricated using a double polysilicon n-channel HMOS technology using polysilicon word lines and metal bit lines. Self-refresh is implemented with an on-chip timer, arbiter, counter and multiplexer. A high-speed arbiter resolves simultaneous memory and refresh requests. Redundant rows are used for increased manufacturing yields. Polysilicon fuses are electrically programmed to select redundant rows.  相似文献   

8.
The load/store pipe for a low-power 1-GHz embedded processor is described. For area savings and logic complexity reduction, the load/store pipe is clocked at twice the frequency of the processor core. It can sustain two load or store operations per core clock cycle with zero load to use issue latency. The address generation unit for one of the two load/store pipes takes advantage of the common addressing mode in MIPS 64 ISA to generate the address within a core clock phase. Phase borrowing is employed in the translation lookaside buffer (TLB) design to enable a lookup process within a core clock phase. The data cache design enables the activation of a minimum number of data bank arrays for power savings. Small-swing differential buses are used for multiple address and data buses for improved signal transmission latency. The quadrature clocks used to derive the 2/spl times/ clock are generated with a novel 4-to-1 divider and distributed with matched paths, all to reduce the duty cycle variation of the 2/spl times/ clock phase. The design has been implemented in a 0.13-/spl mu/m CMOS process.  相似文献   

9.
A 2-/spl mu/m CMOS VLSI digital signal processor (DSP) family, the SP50, is described that is capable of eight million instructions per second and up to six concurrent operations in each instruction. Two DSPs, the PCB5010 and PCB5011, have been developed. Both are based on a common architecture which contains two 16-bit data buses, and a 16/spl times/16/spl rarr/40-bit multiplier accumulator and 16-bit ALU, both with multiprecision support in hardware. Also implemented are two static data RAMs (128/spl times/16 or 256/spl times/16), a data ROM (51/spl times/16), a 15-word three-port register file, three address computation units, and five serial and parallel I/O interfaces. The data path is controlled by an orthogonal instruction set, using 40-bit microcode words. The controller contains a five-level stack and an instruction repeat register, and can have either on-chip program memory (RAM: 32/spl times/40; ROM: 987/spl times/40) or off-chip program memory (up to 64K/spl times/40). Benchmarks show a two to sixfold improvement in overall performance over its predecessors.  相似文献   

10.
This paper presents the integrated circuit design for a wireless bidirectional transmission microstimulator. This implantable device is composed of an internal radio-frequency (RF) front-end circuit, a control circuit, a stimulator, and an on-chip transmitter. A 2-MHz amplitude-shift keying modulated signal, including the power and data necessary for the implantable device, is received, and a stable 3-V dc voltage and digital data will be extracted to further execute neuromuscular stimulation. The current-mode microstimulator can produce a bidirectional output current with 8-bit resolution for stimulation. The maximum stimulation current is 1 mA while the stimulation frequency is from 20 Hz to 2 kHz and the pulsewidth of stimulation current is from 150 to 500 /spl mu/s. Furthermore, the system can acquire the biological sensing signal by means of an on-chip transmitter. Most of the signal processing circuits have been designed with low-power schemes to reduce the power consumption, and the performance is also conformed to the requirements of the microstimulator. All of the circuits except for the RF link are combined in a single chip and implemented in TSMC 0.35-/spl mu/m 2P4M standard CMOS process.  相似文献   

11.
The two functions are implemented in a custom integrated circuit (TCF). The system, circuit, and technological aspects of the design are described. The TCF chip performs transcoding from/to 13-bit linear to/from 8 bit, A-law or /spl mu/-law companded PCM, for 32 multiplexed channels. Every channel is transcoded, in both directions, every 125 /spl mu/s. A second function of the device is the generation of metering signal bursts free of audible clicks. The metering frequency is selectable at 12 or 16 kHz. In order to achieve low power consumption and minimal area, all the circuitry is implemented using fully dynamic CMOS.  相似文献   

12.
A single-chip 80-bit floating point VLSI processor capable of performing 5.6 million floating point operations per second has been realized using 1.2-/spl mu/m n-well CMOS technology. The processor handles 80-bit double-extended floating point data conforming to IEEE standard 754. The chip has 128 microinstructions which are stored in an on-chip ROM. By programming microinstruction sequences in an external control storage, not only basic arithmetic operation but also special arithmetic functions can be performed. A composite design method supported by a hierarchical design automation system was used to quickly lay out 50K gates including a 64-/spl times/64-bit multiplier and 15 kb of memory on a chip with a die size of 10/spl times/10 mm/SUP 2/. Only 11 man-months were required for the effort.  相似文献   

13.
A 32-KB standard CMOS antifuse one-time programmable (OTP) ROM embedded in a 16-bit microcontroller as its program memory is designed and implemented in 0.18-$muhbox m$standard CMOS technology. The proposed 32-KB OTP ROM cell array consists of 4.2$muhbox m^2$three-transistor (3T) OTP cells where each cell utilizes a thin gate-oxide antifuse, a high-voltage blocking transistor, and an access transistor, which are all compatible with standard CMOS process. In order for high density implementation, the size of the 3T cell has been reduced by 80% in comparison to previous work. The fabricated total chip size, including 32-KB OTP ROM, which can be programmed via external$hboxI^2hboxC$master device such as universal$hboxI^2hboxC$serial EEPROM programmer, 16-bit microcontroller with 16-KB program SRAM and 8-KB data SRAM, peripheral circuits to interface other system building blocks, and bonding pads, is 9.9$hbox mm^2$. This paper describes the cell, design, and implementation of high-density CMOS OTP ROM, and shows its promising possibilities in embedded applications.  相似文献   

14.
A self-controllable voltage level (SVL) circuit which can supply a maximum dc voltage to an active-load circuit on request or can decrease the dc voltage supplied to a load circuit in standby mode was developed. This SVL circuit can drastically reduce standby leakage power of CMOS logic circuits with minimal overheads in terms of chip area and speed. Furthermore, it can also be applied to memories and registers, because such circuits fitted with SVL circuits can retain data even in the standby mode. The standby power of an 8-bit 0.13-/spl mu/m CMOS ripple carry adder (RCA) with an on-chip SVL circuit is 8.2 nW, namely, 4.0% of that of an equivalent conventional adder, while the output signal delay is 786 ps, namely, only 2.3% longer than that of the equivalent conventional adder. Moreover, the standby power of a 512-bit memory cell array incorporating an SVL circuit for a 0.13-/spl mu/m 512-bit SRAM is 69.1 nW, which is 3.9% of that of an equivalent conventional memory-cell array. The read-access time of this 0.13-/spl mu/m SRAM is 285 ps, that is, only 2 ps slower than that of the equivalent SRAM.  相似文献   

15.
Design data and experimental characteristics are given on an 8192-bit n-channel charge-coupled memory device, intended for applications requiring shorter latency than ordinary MOS shift registers or fixed-head disks and at potentially lower cost than either MOS shift registers or random-access memories. This was achieved by dividing the array into 32 memory blocks of 256 bits each, with addressable, random access to any block, permitting average latency of approximately 100 /spl mu/s. A two-level overlapping polysilicon gate process was used, with conservative design tolerances. Power dissipation on-chip, plus capacitive drive power during data access at 1 MHz is approximately 250 mW, and less than 5 mW during standby at 20 kHz with data retention.  相似文献   

16.
A general-purpose programmable digital signal processor (DSP) has been implemented in 1.5-/spl mu/m (L/SUB eff/) NMOS technology using full-custom circuit design for high performance. The DSP has a 32-bit instruction set, 32-bit data path, and full-hardware 32-bit floating-point arithmetic. The architecture is described section by section, and an overview of the instruction set is presented. The extensive design verification process applied to the DSP is also described.  相似文献   

17.
A 16384-bit charge-coupled device (CCD) memory has been developed for mass storage memory system application where moderate latency, high data rate and low system cost are required. The chip measures only 3.45/spl times/4.29 mm/SUP 2/ (136/spl times/169 mil/SUP 2/), fits a standard 16-pin package, and is organized as four separate shift registers of 4096 bits, each with its own data input and data output terminals. A two-level polysilicon gate n-channel process was used for device fabrication. A condensed serial-parallel-serial (CSPS) structure was found to provide the highest packing density. Only two external clocks are required driving capacitances of 60 pF each at one-half the data transfer rate. Operations at data rates of 100 kHz to 10 MHz have been demonstrated experimentally, the on-chip power dissipation at 10 MHz being less than 20 /spl mu/W/bit.  相似文献   

18.
The circuit and design of an experimental 32-bit execution unit are described. It is fabricated in a scaled NMOS single-layer poly-technology with 2-/spl mu/m minimum gate length and low-ohmic polycide for gates and interconnections. The chip (25000 transistors, 16 mm/SUP 2/, 61 pins) is designed with a high degree of regularity and modularity. The circuit performs logic and arithmetic operations and has an on-chip control ROM for instruction decoding. It operates with a single 5-V supply voltage. Measurements resulted in a typical power dissipation of 750 mW and a maximum operation frequency of 6.5 MHz. At this frequency a 32/spl times/32 bit multiplication is performed in less than 5.5 /spl mu/s.  相似文献   

19.
A full-custom single-chip bipolar ECL RISC microprocessor was implemented in a 1.0-μm single-poly bipolar technology. This research prototype contains a CPU and on-chip 2-KB instruction and 2-KB data caches. Worst-case power dissipation with a nominal -5.2 V supply is 115 W. The chip has been designed for a worst-case clock frequency of 275 MHz at a nominal supply. The chip verifies a new style of CAD tools developed during the design process, advanced packaging techniques for high-power microprocessors, and VLSI ECL circuit techniques  相似文献   

20.
A new single-chip 16-bit monolithic digital/analog converter (DAC) with on-chip voltage reference and operational amplifiers has achieved /spl plusmn/0.0015% linearity, 10 ppm//spl deg/C gain drift, and 4-/spl mu/s settling time. Novel elements of the 16-bit DAC include: the fast settling open-loop reference with a buried Zener, a fast-settling output operational amplifier without the use of feedforward compensation, and a modified R-2R ladder network. Thermal considerations played a significant role in the design. The DAC is fabricated using a 20-V process to reduce device sizes and therefore die size. All laser trimming including temperature drift compensation is performed at the wafer level. The converter does not require external components for operation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号