首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A simple and very effective solution to the delay incurred while propagating data through long interconnection wires is presented. Such delays can be found in large VLSI/ULSI or wafer scale systems. The basic idea of the technique relies on the fragmentation of the wires and in reconnecting them with a special device called repeater in order to form a bidirectional pipeline. A method for determining the optimum configuration of the pipeline is presented. It is shown that, even in presence of an appreciable skew in synchronous systems, the technique improves the transmission speed by 150% for 32-byte messages, when a 10 cm 8-bit bus implemented in a 1.2 μm CMOS technology is used. The improvement increases for longer messages and for larger skews. It is also shown that the actual transmission time is close (to within a factor of 2) to the theoretical limit that could be achieved with a zero-length wire. A method based on repeaters operating at a multiple of the basic system clock frequency is also proposed. It is shown that this technique may speedup data transfer by an order of magnitude. The extension of the technique to asynchronous self-timed repeaters is also discussed. Finally, a VLSI implementation of the synchronous reconnection device is described  相似文献   

2.
On optimal ordering of signals in parallel wire bundles   总被引:1,自引:0,他引:1  
Optimal ordering and sizing of wires in a constrained-width interconnect bundle are studied in this paper. It is shown that among all possible orderings of signal wires, a monotonic order of the signals according to their effective driver resistance yields the smallest weighted average delay. Minimizing weighted average delay is a good approximation for MinMax delay optimization. Three variants of monotonic ordering are proven to be optimal, depending on the Miller coupling factors (MCF) ratio between the signals at the sides of the bundle and that of the internal wires. The monotonic order property holds for a very broad range of VLSI circuit settings arising in common design practice. A simple, yet near-optimal, setting of wire widths within the bundle to yield the best average weighted delay is proposed. The theoretical results have been validated by numerical experiments on 65 nm process technology and industrial design data. In all cases the ordering optimization yielded improvement in the range of 10% in wire delay, translated to about 5% improvement in the clock cycle of a high-performance microprocessor implemented in that technology.  相似文献   

3.
As IC fabrication technologies get into nanometer era, clock routing gradually dominates chip performance indicated by delay, cost, and power consumption. X-architecture can be applied for routing metal wires in diagonal and rectilinear directions to overcome the above challenges due to wirelength reduction. In this paper, we present a clock routing algorithm, called PMXF, to construct an X-architecture zero-skew clock tree with minimum delay. An X-pattern library is defined for simplifying the merging procedure of the DME approach, an X-Flip technique is proposed for reducing the wirelength between the paired points, and a wire sizing technique is applied for achieving zero skew. In terms of clock delay, wirelength, power consumption, and via count listed in the experimental results on benchmarks, the proposed PMXF algorithm can respectively achieve more reductions compared with other previous X-architecture clock routing algorithms.  相似文献   

4.
Capacitive crosstalk between adjacent signal wires has significant effect on performance and delay uncertainty of point-to-point on-chip buses in deep submicrometer (DSM) VLSI technologies. We propose a hybrid polarity repeater insertion technique that combines inverting and non-inverting repeater insertion to achieve constant average effective coupling capacitance per wire transition for all possible switching patterns. Theoretical analysis shows the superiority of the proposed method in terms of performance and delay uncertainty compared to conventional and staggered repeater insertion methods. Simulations at the 90-nm node on semi-global METAL5 layer show around 25% reduction in worst case delay and around 86% delay uncertainty minimization compared to standard bus with optimal repeater configuration. The reduction in worst case capacitive coupling reduces peak energy which is a critical factor for thermal regulation and packaging. Isodelay comparisons with standard bus show that the proposed technique achieves considerable reduction in total buffers area, which in turn reduces average energy and peak current. Comparisons with staggered repeater which is one of the simplest and most effective crosstalk reduction techniques in the literature show that hybrid polarity repeater offers higher performance, less delay uncertainty, and reduced sensitivity to repeater placement variation.   相似文献   

5.
A smart repeater is proposed for driving capacitively-coupled, global-length on-chip interconnects that alters its drive strength dynamically to match the relative bit pattern on the wires and thus the effective capacitive load. This is achieved by partitioning the driver into main and assistant drivers; for a higher effective load capacitance both drivers switch, while for a lower effective capacitance the assistant driver is quiet. In a UMC 0.18-mum technology the potential energy saving is around 10% and the reduction in jitter 20%, in comparison to a traditional repeater for typical global wire lengths. It is also shown that the average energy saving for nanometer technologies is in the range of 20% to 25%. The driver architecture exploits the fact that as feature sizes decrease, the capacitive load per transistor shrinks, whereas global wire loads remain relatively unchanged. Hence, the smaller the technology, the greater the potential saving.  相似文献   

6.
李芝燕  严晓浪 《微电子学》1999,29(3):164-168
针对时钟布线提出了一种有效的变线宽算法。该算法通过对时钟树中各树枝延迟敏感度的分析,选择总体最优的连线进行变线宽处理,使得时钟树的路径延迟最小化。在延迟优化后,为了使时钟偏差小于给定的约束,通过变线宽对各种钟汇点的延迟进行全面的再分配,使延迟最大的时钟汇点延迟最小化,而延迟较小的路径延迟适当增加,以进一步改善时钟树延迟。实验结果表明,该算法有较高的运行效率,时钟树的路径路径和时钟偏差得到了显著的改  相似文献   

7.
在深亚微米设计中,降低能耗和传播延迟是片上全局总线所面对的两个最主要设计目标.本文提出了一种用于片上全局总线的时空编码方案,它既提高了性能又降低了峰值能耗和平均能耗.该编码方案利用空间总线倒相编码和时间编码电路技术的优点,在消除相邻连线上反相翻转的同时,减少了自翻转数和耦合翻转数.在应用该总线编码技术降低总线延时和能耗的设计中,给出了一种总线上插入中继驱动器的设计方法,以确定它们合适的尺寸和插入位置,使得在满足目标延时和翻转斜率要求的同时总线总的能耗最小.该方法可用来为各种编码技术获得翻转斜率约束下的总线能耗与延时的优化折中.  相似文献   

8.
在深亚微米设计中,降低能耗和传播延迟是片上全局总线所面对的两个最主要设计目标.本文提出了一种用于片上全局总线的时空编码方案,它既提高了性能又降低了峰值能耗和平均能耗.该编码方案利用空间总线倒相编码和时间编码电路技术的优点,在消除相邻连线上反相翻转的同时,减少了自翻转数和耦合翻转数.在应用该总线编码技术降低总线延时和能耗的设计中,给出了一种总线上插入中继驱动器的设计方法,以确定它们合适的尺寸和插入位置,使得在满足目标延时和翻转斜率要求的同时总线总的能耗最小.该方法可用来为各种编码技术获得翻转斜率约束下的总线能耗与延时的优化折中.  相似文献   

9.
This paper presents a differential current-sensing technique as an alternative to existing circuit techniques for on-chip interconnects. Using a novel receiver circuit, it is shown that, delay-optimal current-sensing is a faster (20% on an average) option as compared to the delay-optimal repeater insertion technique for single-cycle wires. Delay benefit for current-sensing increases with an increase in wire width. Unlike repeaters, current-sensing does not require placement of buffers along the wire, and hence, eliminates any placement constraints. Inductive effects are negligible in differential current-sensing. Current-sensing also provides a tighter bound on delay with respect to process variations. However, current-sensing has some drawbacks. It is power inefficient due to the presence of static-power dissipation. Current-sensing is essentially a low-swing signaling technique, and hence, it is sensitive to full swing aggressor noise.  相似文献   

10.
This paper proposes a repeater for boosting the speed of interconnects with low power dissipation. We have designed and implemented at 45 and 32 nm technology nodes. Delay and power dissipation performances are analyzed for various voltage levels at these technology nodes using Spice simulations. A significant reduction in delay and power dissipation are observed compared to a conventional repeater. The results show that the proposed high-speed low-power repeater has a reduced delay for higher load capacitance. The proposed repeater is also compared with LPTG CMOS repeater, and the results shows that the proposed repeater has reduced delay. The proposed repeater can be suitable for high-speed global interconnects and has the capacity to drive large loads.  相似文献   

11.
In this paper, a global clock network that incorporates standing waves and coupled oscillators to distribute a high-frequency clock signal with low skew and low jitter is described. The key design issues involved in generating standing waves on a chip are discussed, including minimizing wire loss within an available technology. A standing-wave oscillator, which is a distributed oscillator that sustains ideal standing waves on lossy wires, is introduced. A clock grid architecture comprised of coupled standing-wave oscillators and differential low-swing clock buffers is presented, along with a compact circuit model for networks of oscillators. The measured results for a prototyped standing-wave clock grid operating at 10 GHz and fabricated in a 0.18-/spl mu/m 6M CMOS logic process are presented. A technique is proposed for on-chip skew measurements with subpicosecond precision.  相似文献   

12.
一种基于目标延迟约束缓冲器插入的互连优化模型   总被引:1,自引:1,他引:0  
基于分布式RLC传输线,提出在互连延迟满足目标延迟的条件下,利用拉格朗日函数改变插入缓冲器数目与尺寸来减小互连功耗和面积的优化模型. 在65nm CMOS工艺下,对两组不同类型的互连线进行计算比较,验证该模型在改善互连功耗与面积方面的优点. 此模型更适合全局互连线的优化,而且互连线越长,优化效果越明显,能够应用于纳米级SOC的计算机辅助设计和集成电路优化设计.  相似文献   

13.
An analysis of a symbol timing recovery (STR) technique using digital processing is presented. The ratio of the discrete spectral power at the symbol rate frequency to the nearby continuous spectral power is used as a criterion of STR performance. It is shown that this power ratio equals the quality factor of the narrow bandpass filter and that it does not depend on the value of the delay element. The performance of the STR subsystem is consequently determined by the quality factor of the bandpass filter rather than by the specific delay. In addition to this, some experimental evidence that the additive channel noise has little effect on the power ratio is given. A modified phase-locked loop with anLCprefiiter is proposed to extract the symbol timing clock. This prefilter improves the acquisition and synchronization performance of the PLL. The STR technique presented here has the advantages of lower cost and simpler hardware implementation over other serial STR techniques.  相似文献   

14.
Thermal-Aware Methodology for Repeater Insertion in Low-Power VLSI Circuits   总被引:1,自引:0,他引:1  
In this paper, the impact of thermal effects on low-power repeater insertion methodology is studied. An analytical methodology for thermal-aware repeater insertion that includes the electrothermal coupling between power, delay, and temperature is presented, and simulation results with global interconnect repeaters are discussed for 90- and 65-nm technology. Simulation results show that the proposed thermal-aware methodology can save 17.5% more power consumed by the repeaters compared to a thermal-unaware methodology for a given allowed delay penalty. In addition, the proposed methodology also results in a lower chip temperature, and thus, extra leakage power savings from other logic blocks.  相似文献   

15.
A buffer distribution algorithm for high-performance clock netoptimization   总被引:1,自引:0,他引:1  
We propose a new approach for optimizing clock trees, especially for high-speed circuits. Our approach provides a useful guideline to a designer, by user-specified parameters, and three of these tradeoffs are provided in this paper. (1) First, to provide a “good” tradeoff between skew and wire length, a new clock tree routing scheme is proposed. The technique is based on a combination of hierarchical bottom-up geometric matching and minimum rectilinear Steiner tree. Our experiments complement the theoretical results. (2) For high-speed clock distribution in the transmission line mode (e.g., multichip modules) where interconnection delay dominates the clock delay, buffer congestion might exist in a layout. Using many buffers in a small wiring area results in substantial interline crosstalks as well as wirability, when the elongation of the imbalanced subtrees is necessary. Placing buffers evenly (locally or globally) over the plane at the minimum impact on wire length increase helps avoid buffer congestion and results in less crosstalk between clock wires. Thus, an effective technique for buffer distribution is proposed. Experimental results verify the effectiveness of the proposed algorithms. (3) Finally, a postprocessing step constraining on phase-delay is also proposed. The technique is based on a combination of hierarchical bottom-up geometric matching and bounded radius minimum spanning tree. The proposed algorithm has an important application in MCM clock net synthesis as well as VLSI clock net synthesis  相似文献   

16.
Metallic carbon nanotubes(CNTs) have been proposed as a promising alternative to Cu interconnects in future integrated circuits(ICs) for their remarkable conductive, mechanical and thermal properties. Compact equivalent circuit models for single-walled carbon nanotube(SWCNT) bundles are described, and the performance of SWCNT bundle interconnects is evaluated and compared with traditional Cu interconnects at different interconnect levels for through-silicon-via-based three dimensional(3D) ICs. It is shown that at a local level, CNT interconnects exhibit lower signal delay and smaller optimal wire size. At intermediate and global levels, the delay improvement becomes more significant with technology scaling and increasing wire lengths. For 1 mm intermediate and 10 mm global level interconnects, the delay of SWCNT bundles is only 49.49% and 52.82% that of the Cu wires, respectively.  相似文献   

17.
Interconnect Driver Design for Long Wires in Field-Programmable Gate Arrays   总被引:1,自引:0,他引:1  
Each new semiconductor technology node brings smaller, faster transistors and smaller, slower wires. In particular, long interconnect wires in modern FPGAs now require rebuffering at interior points in the wire. This paper presents a framework for designing and evaluating long, buffered interconnect wires in FPGAs with near-optimal delay performance using HSPICE-derived delays. Given a target physical wire length, width, and spacing, the method determines the number, size, and position of buffers required to obtain the fastest signal velocity for programmable interconnect. While traditional hand-calculations used for ideal repeater placement can be used, they are not very accurate and ignore practical constraints such as the overhead effects of front-end multiplexing and driving logic, “finite” wire length, and a discrete number of repeaters. A metric introduced during the design is the “path delay profile”, or the arrival time of a signal at different points of a long wire. This method is used to design buffering strategies for interconnect based on 0.5, 2, and 3 mm wire lengths in 180 nm technology. These interconnect designs are coded into VPR along with an improved timing analyzer which accurately determines the “path delay profile” arrival times. Using VPR, average critical-path delay is reduced by 19% for 0.5 mm wires and by up to 46% for 3mm wires over previous designs.
Shahriar MirabbasiEmail:
  相似文献   

18.
This paper describes an interconnect technique for subthreshold circuits to improve global wire delay and reduce the delay variation due to process-voltage-temperature (PVT) fluctuations. By internally boosting the gate voltage of the driver transistors, operating region is shifted from subthreshold region to super-threshold region enhancing performance and improving tolerance to PVT variations. Simulations of a clock distribution network using the proposed driver shows a 66%-76% reduction in 3sigma clock skew value and 84%-88% reduction in clock tree delay compared to using conventional drivers. A 0.4-V test chip has been fabricated in a 0.18-mum 6-metal CMOS process to demonstrate the effectiveness of the proposed scheme. Measurement results show 2.6times faster switching speed and 2.4times less delay sensitivity under temperature variations.  相似文献   

19.
In this paper a coupled electro-thermal model is used for the optimal design of the clock distribution tree of a high performance microprocessor. Such approach allows simultaneously to take into account both thermal and electrical constraints. In particular timing issues such as clock delay from the root of the tree to the leaves and skew between the leaves are optimized by a suitable wire and buffer sizing. At the same time the lifetime constraints of clock wires that are affected by the electromigration, enhanced by the high temperature reached in interconnects due to the Joule self-heating, are checked and respected.  相似文献   

20.
This paper presents new simulation results of the previously proposed transition skew coding (TSC) for global on-chip interconnects. Considering 2-GHz global clock frequency at the 90-nm node, we show that TSC can be applied to broad range of wire length on both semiglobal and global metal layers, while maintaining its energy efficiency and its advantages in terms of crosstalk reduction and signal integrity, and wiring and repeater area minimization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号