首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Pentium/spl reg/ 4 processor architecture uses a 2/spl times/ frequency core clock to implement low latency integer operations. Low-voltage-swing (LVS) logic circuits implemented in 90-nm technology meet the frequency demands of a third-generation integer-core design.  相似文献   

2.
The 3-MB on-chip level three cache in the Itanium 2 processor, built on an 0.18-/spl mu/m, six-layer Al metal process, employs a subarray design style that efficiently utilizes available area and flexibly adapts to floor plan changes. Through a distributed decoding scheme and compact circuit design and layout, 85% array efficiency was achieved for the subarrays. In addition, various test and reliability features were included. The cache allows for a store and a load every four core cycles and has been characterized to operate above 1.2 GHz at 1.5 V and 110/spl deg/C. When running at 1.0 GHz, the cache provides a total bandwidth of 64 GB/s.  相似文献   

3.
As bus lengths on multihundred-million transistor systems-on-a-chip (SoC) grow, and as interwire capacitances of sub-0.10 /spl mu/m technologies advance, the resulting high-switching capacitances of buses (and interconnects in general) have a nonnegligible impact on the power consumption of a whole SoC. This trend has been recognized and recently addressed by various research groups. We address this problem by introducing our bus encoding technique, adaptive dictionary-encoding scheme "ADES" that minimizes the power consumption of data buses through a dictionary-based encoding technique. Based on exploration of data properties on buses, our technique saves on average more than 25% of bus energy compared to the nonencoded cases using a large set of real-world applications for both address and data buses. Furthermore, we compare our technique to the best-known data bus encoding techniques to date and we find that it exceeds all of them in terms of energy savings for the same set of applications.  相似文献   

4.
This paper describes the design and implementation of a very long instruction word (VLTW) microprocessor. The VLIW integer processor (VIPER) contains four pipelined functional units and can achieve 0.25-cycle-per-instruction performance. The processor is capable of performing multiway branch operations, two load/store operations, or up to four ALU operations in each clock cycle, with full register file access to each functional unit. Designed in twelve months, the processor is integrated with an instruction cache controller and a data cache, requiring 450,000 transistors and a die size of 12.9 mm×9.1 mm in a 1.2-μm technology  相似文献   

5.
The microarchitecture of the synergistic processor for a cell processor   总被引:1,自引:0,他引:1  
This paper describes an 11 FO4 streaming data processor in the IBM 90-nm SOI-low-k process. The dual-issue, four-way SIMD processor emphasizes achievable performance per area and power. Software controls most aspects of data movement and instruction flow to improve memory system performance and core performance density. The design minimizes instruction latency while providing for fine grain clock control to reduce power.  相似文献   

6.
Dual on-chip 512-KB unified second level (L2) caches for an UltraSparc processor are implemented using 0.13-/spl mu/m technology. Each 512-KB unit is implemented using 34 million transistors to achieve 1.4 GHz and 2.6 W at 1.3 V and 85/spl deg/C. This fully integrated subsystem is composed of conventional data and tag SRAMs along with datapaths, controller, and test engines. The unit achieves one of the shortest on-chip L2 cache latencies reported for 64-bit microprocessors, with a data latency of only four cycles including ECC correction for 128-bit data. In addition, balanced custom and automated design methodologies are used to achieve the aggressive design cycle. Architectural and physical design solutions to build this integrated short latency L2 cache are discussed.  相似文献   

7.
A 2-/spl mu/m CMOS VLSI digital signal processor (DSP) family, the SP50, is described that is capable of eight million instructions per second and up to six concurrent operations in each instruction. Two DSPs, the PCB5010 and PCB5011, have been developed. Both are based on a common architecture which contains two 16-bit data buses, and a 16/spl times/16/spl rarr/40-bit multiplier accumulator and 16-bit ALU, both with multiprecision support in hardware. Also implemented are two static data RAMs (128/spl times/16 or 256/spl times/16), a data ROM (51/spl times/16), a 15-word three-port register file, three address computation units, and five serial and parallel I/O interfaces. The data path is controlled by an orthogonal instruction set, using 40-bit microcode words. The controller contains a five-level stack and an instruction repeat register, and can have either on-chip program memory (RAM: 32/spl times/40; ROM: 987/spl times/40) or off-chip program memory (up to 64K/spl times/40). Benchmarks show a two to sixfold improvement in overall performance over its predecessors.  相似文献   

8.
A wide-range delay-locked loop with a fixed latency of one clock cycle   总被引:1,自引:0,他引:1  
A delay-locked loop (DLL) with wide-range operation and fixed latency of one clock cycle is proposed. This DLL uses a phase selection circuit and a start-controlled circuit to enlarge the operating frequency range and eliminate harmonic locking problems. Theoretically, the operating frequency range of the DLL can be from 1/(N/spl times/T/sub Dmax/) to 1/(3T/sub Dmin/), where T/sub Dmin/ and T/sub Dmax/ are the minimum and maximum delay of a delay cell, respectively, and N is the number of delay cells used in the delay line. Fabricated in a 0.35 /spl mu/m single-poly triple-metal CMOS process, the measurement results show that the proposed DLL can operate from 6 to 130 MHz, and the total delay time between input and output of this DLL is just one clock cycle. From the entire operating frequency range, the maximum rms jitter does not exceed 25 ps. The DLL occupies an active area of 880 /spl mu/m/spl times/515 /spl mu/m and consumes a maximum power of 132 mW at 130 MHz.  相似文献   

9.
A single-path pulsewidth control loop with a built-in delay-locked loop   总被引:1,自引:0,他引:1  
A 1-1.27-GHz single-path pulse width control loop with a built-in delay-locked loop is presented. Based on the proposed circuit, not only can the 50% duty cycle of the output clock be assured but the phase alignment between the reference and output clocks can also be achieved. Moreover, the requirement of the reference clock with 50% duty cycle can be eliminated. By the single-to-complementary circuit and the switched charge pump, the duty cycle error can be reduced. Moreover, the duty cycle of the output clock can be adjusted for applications such as time-interleaved analog-to-digital converters, switched-capacitor circuits, and dc-dc converters. The proposed circuit has been fabricated in a 0.35-/spl mu/m CMOS process. The power consumption is 150 mW and the die area of the core circuit is 0.47/spl times/0.3 mm/sup 2/. The duty cycle of the output clock can be adjusted from 35% to 70% in steps of 5%.  相似文献   

10.
This paper reports the design and measurement results of a write pulse generator IC for rewritable CD and DVD disk drives implemented in a standard digital 0.35 /spl mu/m CMOS technology. The chip is the interface between a processor and a laser driver. It provides accurate timing signals to the laser driver via a four-level differential current interface. Transitions between current levels are programmable with 149 ps resolution at a data rate of 420 Mb/s, corresponding to 16x DVD write speed. The chip includes a digital core managing the different write strategies, a CMOS serial interface to the processor for programming, a low power, low phase noise, 64-phase ring voltage-controlled oscillator (VCO) based on CMOS inverters, a phase-locked loop (PLL) locking the VCO to the system clock, and a current interface to the laser driver. The PLL phase noise is -144 dBc/Hz at 10 MHz offset from the 105 MHz carrier. At this frequency, the rms jitter is 1.1 ps with 0.8 mA VCO core supply current. The chip is fully ESD protected.  相似文献   

11.
A 20-Gb/s transmitter is implemented in 0.13-/spl mu/m CMOS technology. An on-die 10-GHz LC oscillator phase-locked loop (PLL) creates two sinusoidal 10-GHz complementary clock phases as well as eight 2.5-GHz interleaved feedback divider clock phases. After a 2/sup 20/-1 pseudorandom bit sequence generator (PRBS) creates eight 2.5-Gb/s data streams, the eight 2.5-GHz interleaved clocks 4:1 multiplex the eight 2.5-Gb/s data streams to two 10-Gb/s data streams. 10-GHz analog sample-and-hold circuits retime the two 10-Gb/s data streams to be in phase with the 10-GHz complementary clocks. Two-tap equalization of the 10-Gb/s data streams compensate for bandwidth rolloff of the 10-Gb/s data outputs at the 10-GHz analog latches. A final 20-Gb/s 2:1 output multiplexer, clocked by the complementary 10-GHz clock phases, creates 20-Gb/s data from the two retimed 10-Gb/s data streams. The LC-VCO is integrated with the output multiplexer and analog latches, resonating the load and eliminating the need for clock buffers, reducing power supply induced jitter and static phase mismatch. Power, active die area, and jitter (rms/pk-pk) are 165 mW, 650 /spl mu/m/spl times/350 /spl mu/m, and 2.37 ps/15 ps, respectively.  相似文献   

12.
In this paper, we present a novel 128/64 point fast Fourier transform (FFT)/ inverse FFT (IFFT) processor for the applications in a multiple-input multiple-output orthogonal frequency-division multiplexing based IEEE 802.11n wireless local area network baseband processor. The unfolding mixed-radix multipath delay feedback FFT architecture is proposed to efficiently deal with multiple data sequences. The proposed processor not only supports the operation of FFT/IFFT in 128 points and 64 points but can also provide different throughput rates for 1-4 simultaneous data sequences to meet IEEE 802.11n requirements. Furthermore, less hardware complexity is needed in our design compared with traditional four-parallel approach. The proposed FFT/IFFT processor is designed in a 0.13-mum single-poly and eight-metal CMOS process. The core area is 660times2142 mum2 , including an FFT/IFFT processor and a test module. At the operation clock rate of 40 MHz, our proposed processor can calculate 128-point FFT with four independent data sequences within 3.2 mus meeting IEEE 802.11n standard requirements  相似文献   

13.
A 14-bit analog-to-digital converter (ADC) design for GSM/GPRS/EDGE handsets is implemented in 0.25 /spl mu/m CMOS. The measured SNR/SNDR/DR is 85.2/84.1/88 dB respectively. The modulator and the clock generator consume 1.05 mA from 2.7 V supply. A delay-locked-loop (DLL)-based bias scheme is implemented to guarantee that amplifier slewing takes a fixed percentage of the clock cycle over process corners, temperature, and clock frequency. The proposed biasing scheme is shown to minimize settling error variations and contain design margins.  相似文献   

14.
A feedforward compensation scheme with no Miller capacitors is proposed to overcome the bandwidth limitations of traditional Miller compensation schemes. The technique has been used in the design of an operational transconductance amplifier (OTA) with a dc gain of 80 dB, gain bandwidth of 1.4 GHz, phase margin of 62/spl deg/, and 2 ns settling time for 2-pF load capacitor in a standard 0.35-/spl mu/m CMOS technology. The OTA's current consumption is 4.6 mA. The OTA is used in the design of a fourth-order switched-capacitor bandpass /spl Sigma//spl Delta/ modulator with a clock frequency of 92 MHz. It achieves a peak signal-to-noise ratio of 80 and 54 dB for 270-kHz (GSM) and 3.84-MHz (CDMA) bandwidths, respectively and consumes 19 mA of current from a /spl plusmn/1.25-V supply.  相似文献   

15.
This paper presents a baseband processor architecture for pulsed ultra-wideband signals. It consists of an analog-to-digital converter (ADC), a clock generation system, and a digital back-end. The clock generation system provides different phases of a 300-MHz clock using four differential inverter stages. The specification of the jitter standard deviation is 100 ps. The Flash interleaved ADC provides four bit samples at 1.2 Gsps. The back-end uses parallelization to process these samples and to reduce the signal acquisition time to 65 /spl mu/s. The entire synchronization algorithm is implemented in the digital domain, without feeding any signals back to the clock control. The baseband processor and ADC were implemented on the same 0.18-/spl mu/m CMOS die at 1.8 V as part of a complete baseband transceiver. A wireless data rate of 193 kb/s is demonstrated.  相似文献   

16.
This quad-issue processor achieves 1-GHz operation through improved dynamic circuit techniques in critical paths and a more extensive on-chip memory system which scales in both bandwidth and latency. Critical logic paths use domino, delayed clocked domino, and logic embedded in dynamic flip-flops for minimum delay. A 64-KB sum-addressed memory data cache combines the address offset add with the cache decode, allowing the average memory latency to scale by more than the clock ratio. Memory bandwidth is improved by using wave pipelined SRAM designs for on-chip caches and a write cache for store traffic. Memory power is controlled without increased latency by use of delayed-reset logic decoders. The chip operates at 1000 MHz and dissipates less than 80 W from a 1.6-V supply. It contains 23 million transistors (12 million in RAM cells) on a 244 mm2 die  相似文献   

17.
A new unique conversion technique named the `Penta-Phase Integration' method, applied to a single-chip C/SUP 2/MOS 12-bit analog-to-digital converter designed for microprocessor system, is introduced and described. The newly developed device, fabricated with a standard metal gate CMOS process including an 8-channel multiplexer and TTL compatibility, has several features: unipolar- and ratiometric-conversion can be performed; conversion accuracy within /spl plusmn/0.05 percent of full scale over the -35/spl deg/C-+85/spl deg/C temperature range can be obtained; conversion time is 1.1 ms at a 20 MHz clock frequency, and the device can be operated with a single 5 V power supply and 6 mW power consumption at a 4 MHz clock frequency. The new technique essentially incorporated several methods which divide one conversion cycle into five-phases, accomplish minimization of the error caused by comparator response delay, provide several narrow flat phases to eliminate switching errors due to parasitic capacitance, and enable high clock frequency operation in digital circuits by utilizing C/SUP 2/MOS circuit technology and a synchronized configuration for counters.  相似文献   

18.
An all-digital delay-locked loop (DLL) and an all-digital pulsewidth-control loop (PWCL) with adjustable duty cycles are presented. For the DLL, by using the flash time-to-digital conversion, both the phase alignment and the duty cycle of the output clock are assured in 10 cycles. For the PWCL, the sequential time-to-digital conversion is adopted to reduce the required D-flip-flops and lock within 28 cycles. For both of the proposed circuits, the requirement of the input clock with 50% duty cycle is eliminated. The proposed circuits have been fabricated in a 0.35-/spl mu/m CMOS process. The proposed DLL generates the output clock with the duty cycle of 25%, 50% and 75%, and the operation frequency range is from 140 to 260 MHz. For the proposed PWCL, the duty cycle is adjusted from 30% to 70% in steps of 10%. The operation frequency range is from 400 to 600 MHz.  相似文献   

19.
The 18-way set-associative, single-ported 9 MB cache for the Itanium 2 processor uses 210 identical 48-kB sub-arrays with a 2.21-/spl mu/m/sup 2/ cell in a 130-nm 6-metal technology. The processor runs at 1.7 GHz at 1.35 V and dissipates 130 W. The 432-mm/sup 2/ die contains 592 M transistors, the largest transistor count reported for a microprocessor. This paper reviews circuit design and implementation details for the L3 cache data and tag arrays. The staged mode ECC scheme avoids a latency increase in the L3 tag. A high V/sub t/ implant improves the read stability and reduces the sub-threshold leakage.  相似文献   

20.
A 1.3-GHz fifth-generation SPARC64 microprocessor   总被引:1,自引:0,他引:1  
A fifth-generation SPARC64 processor is fabricated in 130-nm partially depleted silicon-on-insulator CMOS with eight layers of Cu metallization. At V/sub dd/ = 1.2 V and T/sub a/ = 25/spl deg/C, it runs at 1.3 GHz and dissipates 34.7 W. The chip contains 191 M transistors with 19 M logic circuits in an area of 18.14 mm /spl times/ 15.99 mm and is covered with 5858 bumps, of which 269 are for I/O signals. It is mounted in a 1360-pin land-grid-array package. The 16-byte-wide system bus operates with a 260-MHz clock in single-data-rate or double-data-rate modes. This processor implements an error-detection mechanism for execution units and data path logic circuits in addition to on-chip arrays to detect data corruption. Intermittent errors detected in execution units and data paths are recovered via instruction retry. A soft barrier clocking scheme allows amortization of the clock skew and jitter over multiple cycles and helps to achieve high clock frequency. Tunability of the clock timing makes timing closure easier. A relatively small amount of custom circuit design and the use of mostly static circuits contributes to achieve short development time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号