首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Describes a 256-word × 32-bit 4-read, 4-write ported register file for 6-GHz operation in 1.2-V 130-nm technology. The local bitline uses a pseudostatic technique for aggressive bitline active leakage reduction/tolerance to enable 16 bitcells/bitline, low-Vt usage, and 50% keeper downsizing. Gate-source underdrive of -V cc on read-select transistors is established without additional supply/bias voltages or gate-oxide overstress. 8% faster read performance and 36% higher dc noise robustness is achieved compared to dual-Vt bitline scheme optimized for high performance. Device-level measurements in the 130-nm technology show 703× bitline active leakage reduction, enabling continued Vt scaling and robust bitline scalability beyond 130-nm generation. Sustained performance and robustness benefit of the pseudostatic technique against conventional dynamic bitline with keeper-upsizing is also presented  相似文献   

2.
This paper describes a 32-bit address generation unit designed for 4-GHz operation in 1.2-V 130-nm technology. The AGU utilizes a 152-ps sparse-tree adder core to achieve 20% delay reduction, 80% lower interconnect complexity, and a low (1%) active energy leakage component. The dual-V/sub T/ semidynamic implementation of the adder core provides the performance of a dynamic CMOS design with an average energy profile similar to static CMOS, enabling 71% savings in average energy with a good sub-130-nm scaling trend.  相似文献   

3.
This 130-nm Itanium 2 processor implements the explicitly parallel instruction computing (EPIC) architecture and features an on-die 6-MB 24-way set-associative level-3 cache. The 374-mm/sup 2/ die contains 410 M transistors and is implemented in a dual-V/sub t/ process with six Cu interconnect layers and FSG dielectric. The processor runs at 1.5 GHz at 1.3 V and dissipates a maximum of 130 W. This paper reviews circuit design and package details, power delivery, the reliability, availability, and serviceability (RAS) features, design for test (DFT), and design for manufacturability (DFM) features, as well as an overview of the design and verification methodology. The fuse-based clock deskew circuit achieves 24-ps skew across the entire die, while the scan-based skew control further reduces it to 7 ps. The 128-bit front-side bus has a bandwidth of 6.4 GB/s and supports up to four processors on a single bus.  相似文献   

4.
This work describes an aggressive SRAM cell configuration using dual-V/sub T/ and minimum channel length to achieve high performance. A bitline leakage reduction technique is incorporated into an L1 cache design using the new cell in a 100-nm dual-V/sub T/ technology to eliminate impacts of bitline leakage on performance and noise margin with minimal area overhead. Bitline delay is 23% better than the best conventional design, thus enabling 6-GHz operation at with 15% higher energy.  相似文献   

5.
A 4-MB L2 data cache was implemented for a 64-bit 1.6-GHz SPARC(r) RISC microprocessor. Static sense amplifiers were used in the SRAM arrays and for global data repeaters, resulting in robust and flexible timing operation. Elimination of the global clock grid over the SRAM array saves power, enabled by combining the clock information with array select signals. Redundancy was implemented flexibly, with shift circuits outside the main data array for area efficiency. The chip integrates 315 million transistors and uses an 8-metal-layer 90-nm CMOS process.  相似文献   

6.
A power efficient 26-GHz 32:1 static frequency divider in 130-nm bulk CMOS   总被引:2,自引:0,他引:2  
A 32:1 static frequency divider consisting of five stages of 2:1 dividers using current mode logic (CML) was fabricated in a 130-nm bulk complementary metal-oxide semiconductor (CMOS) logic process. By optimizing transistors size, high operating speed is achieved with limited power consumption. For an input power of 0dBm, the 32:1 divider operates up to 26GHz with a 1.5-V supply voltage. The whole 32:1 chain including buffers consumes 8.97mW and the first stage consumes only 3.88mW at a 26-GHz operation. The power consumption of the first 2:1 stage is less than 15% of other bulk CMOS static frequency dividers operating at the same frequency.  相似文献   

7.
A 1.5-V 256-263 8-modulus prescaler and a 1.5-V integer-N phase-locked loop (PLL) with eight different output frequencies have been implemented in a 0.13-mum foundry CMOS process. The synchronous divide-by-4/5 circuit uses current mode logic (CML) D-flip-flops with resistive loads to achieve 21-GHz maximum operating frequency at input power of 0 dBm. The divider is used to implement an 8-modulus prescaler consuming 6-mA current and 9-mW power. This extremely low power consumption is achieved by radically decreasing the sizes of transistors in the divider. Utilizing the prescaler, a charge-pump integer-N PLL has been demonstrated with 20-GHz output frequency. The in-band phase noise of the PLL at 60-kHz offset and out-of-band phase noise at 10-MHz offset are ~-80 dBc/Hz and -116.1 dBc/Hz, respectively. The locking range is from 20.05 to 21 GHz. The PLL consumes 15-mA current and 22.5-mW power from a 1.5-V power supply.  相似文献   

8.
The design of a 90-nm virtually addressed cache subsystem with separate 32-kB instruction and data caches is described. The circuits and microarchitecture are illustrated, including architecture level trace data validating low-power features and provisions to support snooping while maintaining the latency and power of virtual addressing. Low-power memory management unit design including a translation lookaside buffer with process identifier mapping is also described. Level 1 caches with support for high bandwidth, single cycle 256 bit fill and evict, as well as features for low power are also described. The design approaches are validated through both simulation and experimental results.  相似文献   

9.
The 18-way set-associative, single-ported 9 MB cache for the Itanium 2 processor uses 210 identical 48-kB sub-arrays with a 2.21-/spl mu/m/sup 2/ cell in a 130-nm 6-metal technology. The processor runs at 1.7 GHz at 1.35 V and dissipates 130 W. The 432-mm/sup 2/ die contains 592 M transistors, the largest transistor count reported for a microprocessor. This paper reviews circuit design and implementation details for the L3 cache data and tag arrays. The staged mode ECC scheme avoids a latency increase in the L3 tag. A high V/sub t/ implant improves the read stability and reduces the sub-threshold leakage.  相似文献   

10.
In this paper, we present the design of a 32-b arithmetic and log unit (ALU) that allows low-power operation while supporting a design-for-test (DFT) scheme for delay-fault testability. The low-power techniques allow for 18% reduction in ALU total energy for 180-nm bulk CMOS technology with minimal performance degradation. In addition, there is a 22% reduction in standby mode leakage power and 23% lower peak current demand. In the test mode, we employ a built-in DFT scheme that can detect delay faults while reducing the test-mode automatic test equipment clock frequency.  相似文献   

11.
The clock generation and distribution system for the 130-nm Itanium 2 processor operates at 1.5 GHz with a skew of 24 ps. The Itanium 2 processor features 6 MB of on-die L3 cache and has a die size of 374 mm/sup 2/. Fuse-based clock de-skew enables post-silicon clock optimization to gain higher frequency. This paper describes the clock generation, global clock distribution, local clocking, and the clock skew optimization feature.  相似文献   

12.
The implemented static frequency divider provides quadrature (Q) clock outputs and divides frequencies up to 44GHz. The core divider circuit consists of two current-mode logic (CML) latches and consumes 3.2mW from a 1.1-V supply. The divided outputs result in a peak-to-peak and rms jitter of 6.3 and 0.8ps, respectively, and the maximum phase mismatch between the in-phase (I) and Q-outputs amounts to 1ps at an input frequency of 40GHz. The high division frequency is achieved by employing resistive loads, inductive peaking, and optimizing the circuit layout for reduced parasitic capacitances in the latches. The core divider consumes a chip area of 30/spl mu/m/spl times/40/spl mu/m only.  相似文献   

13.
A 4-Mb cache dynamic random access memory (CDRAM), which integrates 16-kb SRAM as a cache memory and 4-Mb DRAM into a monolithic circuit, is described. This CDRAM has a 100-MHz operating cache, newly proposed fast copy-back (FCB) scheme that realizes a three times faster miss access time over with the conventional copy-back method, and maximized mapping flexibility. The process technology is a quad-polysilicon double-metal 0.7-μm CMOS process, which is the same as used in a conventional 4-Mb DRAM. The chip size of 82.9 mm2 is only a 7% increase over the conventional 4-Mb DRAM. The simulated system performance indicated better performance than a conventional cache system with eight times the cache capacity  相似文献   

14.
This paper describes the main features and functions of the Pentium(R) 4 processor microarchitecture. We present the front-end of the machine, including its new form of instruction cache called the trace cache, and describe the out-of-order execution engine, including a low latency double-pumped arithmetic logic unit (ALU) that runs at 4 GHz. We also discuss the memory subsystem, including the low-latency Level 1 data cache that is accessed in two clock cycles. We then describe some of the key features that contribute to the Pentium(R) 4 processor's floating-point and multimedia performance. We provide some key performance numbers for this processor, comparing it to the Pentium(R) III processor  相似文献   

15.
We present a low-jitter digital LC phase-locked loop (PLL) in a standard digital 130-nm CMOS technology, aiming at, but not limited to, clock multiplication in high-speed digital serial interface transceivers. The PLL features a fully digital core and a digitally controlled LC oscillator. The use of an integrated programmable coil enables triple-band operation in multi-GHz range (2.2, 3.4, and 4.6 GHz) on a die area as small as 0.21 mm/sup 2/. A new architecture is proposed which improves the authors' previous work and allows to achieve an outstanding long-term jitter lower than 650 fs over the whole frequency range. The PLL consumes 13 mA of current at 1.5-V supply. Its performances compete favorably with the most advanced analog PLLs and are ahead of digital PLLs. Its digital nature makes it easily realizable in the mainstream digital CMOS technologies, robust against noise, and thus ideal for application as a low-jitter clock multiplying unit in digital intensive systems on chip.  相似文献   

16.
This paper presents a power-efficient CMOS frequency divider (FD) with wide-band programmable division ratio and quadrature outputs for high-speed data transmission applications. The proposed FD consists of a power-efficient programmable divider (PD), a complementary volt-age-to-time converter based duty-cycle correction circuit, and a compact quadrature divider (QD). In the chain of PD, a sense-amplifier based dynamic flip-flop is proposed for 2/3 divider cell to achieve high-speed operation with significantly reduced power consumption. In addition, a simple but effective QD based on two pseudo-differential voltage-controlled tri-state inverters, is beneficial for generating precise quadrature output signals. Measurement results in 40-nm CMOS process show that the proposed FD achieves a wide division range from 16 to 254 and operates up to 14.8 GHz while consuming the power of 540.6 μW at 1.1-V supply, and occupying the active area of 0.00267 mm2 (114.6 μm × 23.3 μm).  相似文献   

17.
A 1-Mb dynamic RAM has been fabricated using 1.2-/spl mu/m double-level metal CMOS technology. A novel divided bitline matrix architecture allows the conventional double-polysilicon planar memory cell to be used without sacrificing signal-to-noise (S/N) ratio or die efficiency. Optimized for high bandwidth, the device uses static column circuitry and a 256K/spl times/4 organization to achieve data rates >180 Mb/s at worst-case voltage and temperature conditions. The 5.97-mm/spl times/11.4-mm die incorporates a flexible laser blown fuse link redundancy scheme which can repair a wide variety of fabrication defects. Typical row access and cycle times are 85 and 190 ns, respectively, achieving >21-Mb/s bandwidth in the non-optimized row access mode. Although some DC power is dissipated in static circuitry, active power consumption has been kept to 225 mW (45 mA), and standby power consumption has been reduced to 2.5 mW (0.5 mA).  相似文献   

18.
In a 0.13-/spl mu/m CMOS logic compatible process, a 256K /spl times/ 32 bit (8 Mb) local SONOS embedded flash EEPROM was implemented using the ATD-assisted current sense amplifier (AACSA) for 0.9 V (0.7 /spl sim/ 1.4 V) low V/sub CC/ application. Read operation is performed at a high frequency of 66 MHz and shows a low current of typically 5 mA at 66-MHz operating frequency. Program operation is performed for common source array with wide I/Os (/spl times/32) by using the data-dependent source bias control scheme (DDSBCS). This novel local SONOS embedded flash EEPROM core has the cell size of 0.276 /spl mu/m/sup 2/ (16.3 F/sup 2//bit) and the program and erase time of 20 /spl mu/s and 20 ms, respectively.  相似文献   

19.
This paper describes a single-cycle 64-bit integer execution ALU fabricated in 90-nm dual-Vt CMOS technology, operating at 4 GHz in the 64-bit mode with a 32-bit mode frequency of 7 GHz (measured at 1.3 V, 25/spl deg/ C). The lower- and upper-order 32-bit domains operate on separate off-chip supply voltages, enabling conditional turn-on/off of the 64-bit ALU mode operation and efficient power-performance optimization. High-speed single-rail dynamic circuit techniques and a sparse-tree semi-dynamic adder architecture enable a dense layout occupying 280 /spl times/ 260 /spl mu/m/sup 2/ while simultaneously achieving: (i) low carry-merge fan-outs and inter-stage wiring complexity; (ii) low active leakage and dynamic power consumption; (iii) high DC noise robustness with maximum low-Vt usage; (iv) single-rail dynamic-compatible ALU write-back bus; (v) simple 2/spl Phi/ 50% duty-cycle timing plan with seamless time-borrowing across phases; (vi) scalable 64-bit ALU performance up to 7 GHz measured at 2.1 V, 25/spl deg/ C; and (vii) scalable 32-bit ALU performance up to 9 GHz measured at 1.68 V, 25/spl deg/ C.  相似文献   

20.
A 1.8-V-only 32-Mb NOR flash EEPROM has been developed based on the 0.25-μm triple-well double-metal CMOS process. A channel-erasing scheme has been implemented to realize a cell size of 0.49 μm2 , the smallest yet reported for 0.25-μm CMOS technology. A block decoder circuit with a novel erase-reset sequence has been designed for the channel-erasing operation. A bitline direct sensing scheme and a wordline boosted voltage pooling method have been developed to obtain high-speed reading operation at low voltage. An access time of 90 ns at 1.8 V has been realized  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号