首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we propose using joint equalization and coding to improve on-chip communication speeds by signaling at rates beyond the rate governed by resistance-capacitance (RC) delay of the interconnect. Operating beyond the RC limit introduces inter-symbol interference (ISI). We mitigate the effects of ISI by employing equalization. The proposed equalizer employs a variable threshold inverter whose switching threshold is modified as a function of past output of the bus. We demonstrate even higher speedups by combining equalization with crosstalk avoidance coding. Specifically, simulation results for a 10-mm 32-bit bus in 0.13-mum CMOS technology show that 1.28 speedup is achievable by equalization alone and 2.30 speedup is achievable by joint equalization and coding.  相似文献   

2.
Efficient RC low-power bus encoding methods for crosstalk reduction   总被引:1,自引:0,他引:1  
In on-chip buses, the RC crosstalk effect leads to serious problems, such as wire propagation delay and dynamic power dissipation. This paper presents two efficient bus-coding methods. The proposed methods simultaneously reduce more dynamic power dissipation and wire propagation delay than existing bus encoding methods. Our methods also reduce more total power consumption than other encoding methods. Simulation results show that the proposed method I reduces coupling activity by 26.7-38.2% and switching activity by 3.7%-7% on 8-bit to 32-bit data buses, respectively. The proposed method II reduces coupling activity by 27.5-39.1% and switching activity by 5.3-9% on 8-bit to 32-bit data buses, respectively. Both the proposed methods reduce dynamic power by 23.9-35.3% on 8-bit to 32-bit data buses and total propagation delay by up to 30.7-44.6% on 32-bit data buses, and eliminate the Type-4 coupling. Our methods also reduce total power consumption by 23.6-33.9%, 23.9-34.3%, and 24.1-34.6% on 8-bit to 32-bit data buses with the 0.18, 0.13, and 0.09 μm technologies, respectively.  相似文献   

3.
A 256-bit/spl times/4-bit static RAM working on a supply voltage down to 1.2 V is described. A serial interface for the address and the data with a 4-bit bus reduces the pincount of the RAM to only 8. Special design techniques to reach the design goal-very low power at a reasonable circuit speed-are discussed in detail. The device is fabricated in a low power silicon gate CMOS process. An operating power of 500 /spl mu/W/MHz and a standby power of less than 1 /spl mu/W at 1.5 V supply voltage was achieved. With this serial interface a cycle time of 1 /spl mu/s at 1.5 V was measured.  相似文献   

4.
在大规模集成电路工艺的深亚微米时代,片上网络(NoC)互连总线遭遇了来自三个方面的威胁:功耗、传输延时、可靠性,它们已经成为限制NoC性能提高的瓶颈。鉴于总线编码的灵活性和综合处理能力,本文首先分析了多种分别针对功耗、延时、可靠性问题的总线编码处理方案,最后介绍了一种统一的总线编码框架来综合处理这三方面的挑战。  相似文献   

5.
NOC(片上网络)的体系结构解决了SOC(片上系统)在大规模集成IP核时面临的一些难题,但其串扰问题对电路性能的影响也越来越明显。基于DSM(深亚微米)下的总线模型,分析了信号串扰引起的总线延时问题,同时比较了3种减小总线串扰的编码方案。并采用0.13μmCMOS工艺对性能较优的DAP编码方案进行了电路仿真,得到了不同长度和宽度下的总线延时。结果表明,采用减少信号串扰的编码方法可以有效地降低总线的串扰,减少信号延时,这一效果当总线较宽或走线较长时尤其明显,同时也证明了0.13μmCMOS工艺下电路仿真结果与理论计算结果的一致性。  相似文献   

6.
The use of deep-submicrometer (DSM) technology increases the capacitive coupling between adjacent wires leading to severe crosstalk noise, which causes power dissipation and may also lead to malfunction of a chip. In this paper, we present a technique that reduces crosstalk noise on instruction buses. While previous research focuses primarily on address buses, little work can be applied efficiently to instruction buses. This is due to the complex transition behavior of instruction streams. Based on instruction sequence profiling, we exploit an architecture that encodes pairs of bus wires and permute them in order to optimize power and noise. A close to optimal architecture configuration is obtained using a genetic algorithm. Unlike previous bus encoding approaches, crosstalk reduction can be balanced with delay and area overhead. Moreover, if delay (or area) is most critical, our architecture can be tailored to add nearly no overhead to the design. For our experiments, we used instruction bus traces obtained from 12 SPEC2000 benchmark programs. The results show that our approach can reduce crosstalk up to 50.79% and power consumption up to 55% on instruction buses.  相似文献   

7.
Capacitive crosstalk between adjacent signal wires has significant effect on performance and delay uncertainty of point-to-point on-chip buses in deep submicrometer (DSM) VLSI technologies. We propose a hybrid polarity repeater insertion technique that combines inverting and non-inverting repeater insertion to achieve constant average effective coupling capacitance per wire transition for all possible switching patterns. Theoretical analysis shows the superiority of the proposed method in terms of performance and delay uncertainty compared to conventional and staggered repeater insertion methods. Simulations at the 90-nm node on semi-global METAL5 layer show around 25% reduction in worst case delay and around 86% delay uncertainty minimization compared to standard bus with optimal repeater configuration. The reduction in worst case capacitive coupling reduces peak energy which is a critical factor for thermal regulation and packaging. Isodelay comparisons with standard bus show that the proposed technique achieves considerable reduction in total buffers area, which in turn reduces average energy and peak current. Comparisons with staggered repeater which is one of the simplest and most effective crosstalk reduction techniques in the literature show that hybrid polarity repeater offers higher performance, less delay uncertainty, and reduced sensitivity to repeater placement variation.   相似文献   

8.
This paper presents a direct digital frequency synthesizer (DDFS) with a 16-bit accumulator, a fourth-order phase domain single-stage /spl Delta//spl Sigma/ interpolator, and a 300-MS/s 12-bit current-steering DAC based on the Q/sup 2/ Random Walk switching scheme. The /spl Delta//spl Sigma/ interpolator is used to reduce the phase truncation error and the ROM size. The implemented fourth-order single-stage /spl Delta//spl Sigma/ noise shaper reduces the effective phase bits by four and reduces the ROM size by 16 times. The DDFS prototype is fabricated in a 0.35-/spl mu/m CMOS technology with active area of 1.11mm/sup 2/ including a 12-bit DAC. The measured DDFS spurious-free dynamic range (SFDR) is greater than 78 dB using a reduced ROM with 8-bit phase, 12-bit amplitude resolution and a size of 0.09 mm/sup 2/. The total power consumption of the DDFS is 200mW with a 3.3-V power supply.  相似文献   

9.
A shared n-well layout technique is developed for the design of dual-supply-voltage logic blocks. It is demonstrated on a design of a 64-bit arithmetic logic unit (ALU) module in domino logic. The second supply voltage is used to lower the power of noncritical paths in the sparse, radix-4 64-bit carry-lookahead adder and in the loopback bus. A 3 mm/sup 2/ test chip in 0.18-/spl mu/m 1.8-V five-metal with local interconnect CMOS technology that contains six ALUs and test circuitry operates at 1.16 GHz at the nominal supply. For target delay increase of 2.8% energy savings are 25.3% using dual supplies, while for 8.3% increase in delay, 33.3% can be saved.  相似文献   

10.
A 640-Mb/s 2048-bit programmable LDPC decoder chip   总被引:3,自引:0,他引:3  
A 14.3-mm/sup 2/ code-programmable and code-rate tunable decoder chip for 2048-bit low-density parity-check (LDPC) codes is presented. The chip implements the turbo-decoding message-passing (TDMP) algorithm for architecture-aware (AA-)LDPC codes which has a faster convergence rate and hence a throughput advantage over the standard decoding algorithm. It employs a reduced complexity message computation mechanism free of lookup tables, and features a programmable network for message interleaving based on the code structure. The chip decodes any mix of 2048-bit rate-1/2 (3,6)-regular AA-LDPC codes in standard mode by programming the network, and attains a throughput of 640 Mb/s at 125 MHz for 10 TDMP-decoding iterations. In augmented mode, the code rate can be tuned up to 14/16 in steps of 1/16 by augmenting the code. The chip is fabricated in 0.18-/spl mu/m six-metal-layer CMOS technology, operates at a peak clock frequency of 125 MHz at 1.8 V (nominal), and dissipates an average power of 787 mW.  相似文献   

11.
A fast, low-power 32K/spl times/8-bit CMOS static RAM with a high-resistive polyload 4-transistor cell has been developed utilizing a dynamic double word line (DDWL) scheme. This scheme combines an automatic power down circuitry and double word line structure. The DDWL, together with bit line and sense line equilibration, reduces the core area delay time and operating power to about 1/2 and 1/15 that of a conventional device, respectively. A newly developed fault-tolerant circuitry improves fabrication yield without degrading the access time. As for a fabrication process, an advanced 1.2-/spl mu/m p-well CMOS technology is developed to realize the ULSI RAM, integrating 1,600,000 elements on a 6.68/spl times/8.86 mm/SUP 2/ chip. The RAM offers, typically, 46 ns access time, 10 mW operating power and 30 /spl mu/W standby power.  相似文献   

12.
This paper describes a single-cycle 64-bit integer execution ALU fabricated in 90-nm dual-Vt CMOS technology, operating at 4 GHz in the 64-bit mode with a 32-bit mode frequency of 7 GHz (measured at 1.3 V, 25/spl deg/ C). The lower- and upper-order 32-bit domains operate on separate off-chip supply voltages, enabling conditional turn-on/off of the 64-bit ALU mode operation and efficient power-performance optimization. High-speed single-rail dynamic circuit techniques and a sparse-tree semi-dynamic adder architecture enable a dense layout occupying 280 /spl times/ 260 /spl mu/m/sup 2/ while simultaneously achieving: (i) low carry-merge fan-outs and inter-stage wiring complexity; (ii) low active leakage and dynamic power consumption; (iii) high DC noise robustness with maximum low-Vt usage; (iv) single-rail dynamic-compatible ALU write-back bus; (v) simple 2/spl Phi/ 50% duty-cycle timing plan with seamless time-borrowing across phases; (vi) scalable 64-bit ALU performance up to 7 GHz measured at 2.1 V, 25/spl deg/ C; and (vii) scalable 32-bit ALU performance up to 9 GHz measured at 1.68 V, 25/spl deg/ C.  相似文献   

13.
This 130-nm Itanium 2 processor implements the explicitly parallel instruction computing (EPIC) architecture and features an on-die 6-MB 24-way set-associative level-3 cache. The 374-mm/sup 2/ die contains 410 M transistors and is implemented in a dual-V/sub t/ process with six Cu interconnect layers and FSG dielectric. The processor runs at 1.5 GHz at 1.3 V and dissipates a maximum of 130 W. This paper reviews circuit design and package details, power delivery, the reliability, availability, and serviceability (RAS) features, design for test (DFT), and design for manufacturability (DFM) features, as well as an overview of the design and verification methodology. The fuse-based clock deskew circuit achieves 24-ps skew across the entire die, while the scan-based skew control further reduces it to 7 ps. The 128-bit front-side bus has a bandwidth of 6.4 GB/s and supports up to four processors on a single bus.  相似文献   

14.
This paper describes an adaptive bandwidth bus (ABB) architecture based on hybrid current/voltage mode repeaters for long global RC interconnect static busses that achieves high-data rates while minimizing the static power dissipation associated with current-mode signaling. Attaining a maximum aggregate bandwidth of 16 Gb/s (i.e., 1 Gb/s per line) across lossy on-chip interconnects spanning 1.75 cm in length, the bus core fabricated in 0.35 /spl mu/m CMOS technology dissipates approximately 93 mW with a supply of 2.5 V and signal activity of 0.5, equivalent to 5.71 pJ/bit. Experimental results using a 16-bit reference bus design that can be externally programmed to operate in voltage, current or adaptive modes indicate a 50% reduction in power dissipation over current-mode (CM) sensing, and an improvement in interconnection delay and signaling bandwidth of 35%-70% and 66% over voltage-mode (VM) sensing, respectively.  相似文献   

15.
A dual-core 64-bit ultraSPARC microprocessor for dense server applications   总被引:1,自引:0,他引:1  
A dual-core 64-bit microprocessor optimized for compute-dense systems such as rack-mount and blade servers for network computing was developed. The chip consists of two UltraSPARC II cores, each with its own 512 kB L2 cache, a DDR-1 memory controller, and symmetric multiprocessor bus (JBus) controllers. The 206-mm/sup 2/ die is fabricated in 0.13-/spl mu/m CMOS technology with seven layers of Cu and a low-k dielectric. The chip offers a highly efficient performance-per-watt ratio with a typical power dissipation of 23 W at 1.3 V and 1.2 GHz. A short design cycle was achieved by leveraging existing designs wherever possible and developing effective design methodologies and flows. Significant design challenges faced by this project are described. These include deep-submicron design issues, such as negative bias temperature instability (NBTI), leakage, coupling noise, intra-die process variation, and electromigration (EM). A second important design challenge was implementing a high-performance L2 cache subsystem with a short four-cycle core-to-L2 latency including ECC.  相似文献   

16.
Time jitter in continuous-time /spl Sigma//spl Delta/ modulators is a known limitation on the maximum achievable signal-to-noise-ratio (SNR). Analysis of time jitter in this type of converter shows that a switched-capacitor (SC) feedback digital-to-analog converter (DAC) reduces the sensitivity to time jitter significantly. In this paper, an I and Q continuous-time fifth-order /spl Sigma//spl Delta/ modulator with 1-bit quantizer and SC feedback DAC is presented, which demonstrates the improvement in maximum achievable SNR when using an SC instead of a switched-current (SI) feedback circuit. The modulator is designed for a GSM/CDMA2000/UMTS receiver and achieves a dynamic range of 92/83/72 dB in 200/1228/3840 kHz, respectively. The intermodulation distance IM2, 3 is better than 87 dB in all modes. Both the I and Q modulator consumes a power of 3.8/4.1/4.5 mW at 1.8 V. Processed in 0.18-/spl mu/m CMOS, the 0.55-mm/sup 2/ integrated circuit includes a phase-locked loop, two oscillators, and a bandgap.  相似文献   

17.
This paper presents an analysis of how the power dissipation of on-chip buses is affected by introducing a relative delay between the switching lines. Relative delay is shown to reduce the dissipated power of oppositely switching lines while causing a power penalty for similarly switching lines. A new low-power bus scheme that uses this effect is proposed and analyzed. As the introduced delay increases, the achieved power reduction increases while decreasing the bus throughput. Thus, a tradeoff between power reduction and throughput is required when selecting the imposed relative delay. The proposed low-power scheme, dynamic delayed line bus (DDL) scheme, led to a power reduction of up to 25%, 33%, and 42% when applied to data, address, and differential buses, respectively. Simple DDL hardware is designed and implemented in a 0.18-/spl mu/m TSMC CMOS technology and applied to a 4500-/spl mu/m long Metal4 bus. Circuit simulation results for different bus widths are presented.  相似文献   

18.
A technique to reduce in-band tones in switch-mode power supplies is described. It takes advantage of the noise-shaping properties of the delta-sigma (/spl Delta//spl Sigma/) modulator to eliminate the spikes normally present in switching power supplies. A framework is introduced for comparing the conventional pulsewidth modulated (PWM) controller and this approach. A buck converter test circuit is constructed that is designed for a PWM controller clocked at 200 kHz and then substituted with a /spl Delta//spl Sigma/ modulator controller clocked at 400 kHz. The RMS noise power of the PWM controller is 14.9 mW compared to the rms noise power for the /spl Delta//spl Sigma/ modulator of 75.85 mW measured in a 2-MHz bandwidth. Although the /spl Delta//spl Sigma/ modulator rms noise power is higher, the noise floor is below the tones seen at the output of the PWM controller. A multibit /spl Delta//spl Sigma/ modulator controller, however, provides a significant reduction in the spectral output of the power supply. Values of 3.75 and 0.24 mW rms noise power are observed at the output of a 2-bit and 4-bit /spl Delta//spl Sigma/ modulator controller, respectively.  相似文献   

19.
As CMOS technology size scales down, multiple cell upsets (MCUs) caused by a single radiation particle have become one of the most challenging reliability issues for memories used in space application. Error correction codes (ECCs) are commonly used to protect memories against errors. Single error correction-Double error detection (SEC-DED) codes are the simplest and most typical ones, but they can only corrected single errors. The advanced ECCs, which can provide enough protection for memories, cost more overhead due to their complex decoders. Orthogonal Latin square (OLS) codes are one type of one-step majority logic decodable (OS-MLD) codes that can be decoded with low complexity and delay. However, there are no OLS codes directly fitting 32-bit data, which is a typical data size in memories. In this paper, (55, 32) and (68, 32) codes derived from (45, 25) and (55, 25) OLS codes have been proposed in order to improve OLS codes in terms of protection for the 32-bit data. The proposed codes can maintain the correction capability of OLS codes and be decoded with low delay and complexity. The evaluation of the implementations for these codes are presented and compared with those of the shortened version (60, 32) and (76, 32) OLS codes. The results show that the area and power of a 2-bit MCUs immune radiation hardened SRAM that protected by the proposed codes have been reduced by 7.76% and 6.34%, respectively. In the case of a 3-bit MCUs immune, the area and power of whole circuits have been reduced by 8.82% and 4.56% when the proposed codes are used.  相似文献   

20.
Complementary MOS silicon-on-sapphire inverters fabricated using silicon-gate technology and 5-/spl mu/m channel-length devices has achieved nanosecond propagation delays and picojoule dynamic power-x delay products. In addition to high switching speed and low dynamic power, inverters with low leakage currents and therefore low quiescent power have been obtained. Two complex CMOS/SOS memories that realize the performance attributes of the individual inverters have been fabricated. An aluminium-gate 256-bit fully decoded static random-access memory features a typical access time of 50 ns at 10 V with a power dissipation of 0.4 /spl mu/W/bit (quiescent) and 10 /spl mu/W/bit (dynamic). The access time at 5 V is typically 95 ns. A silicon-gate 256-bit dynamic shift register features operation at clock signals of 200 MHz at 10 V and 75 MHz at 5 V. The dynamic power dissipation at 50 MHz and 5 V is typically 90 /spl mu/W/bit.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号