首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
孙骥  毛军发  李晓春 《微电子学》2005,35(3):293-296
特定的非零偏差时钟网比零偏差时钟网更具优势,它有助于提高时钟频率、降低偏差的敏感度.文章提出了一种新的非零偏差时钟树布线算法,它结合时钟节点延时和时钟汇点位置,得到一个最大节点延时次序合并策略,使时钟树连线长度变小.实验结果显示,这种算法与典型的最邻近选择合并策略相比较,可以减少20%~30%的总连线长度.  相似文献   

2.
Antenna effect is a phenomenon in the plasma-based nanometer process and directly influences the manufacturing yield of VLSI circuits. Because antenna-critical metal wires have sufficient charges to damage the thin gate oxides of the clock input ports connected by a clock tree, the standard cells or IPs cannot be driven by the clock source synchronously. For a given X-architecture clock tree that connects n clock sinks, we consider the antenna effect in the clock tree and propose a discharge-path-based antenna effect detection method. To fix the antenna violations, we use the jumper insertion technique recommended by foundries. Furthermore, we integrate the layer assignment technique to reduce the inserted jumper and via counts. Differing from the existing works, the delay of vias is considered in delay calculation, and a wire sizing technique is applied for clock skew compensation after fixing the antenna violations. Experimental results on benchmarks show that our algorithm runs in O(n2) to averagely insert 48.21% less jumpers and reduce 20.35% in vias compared with other previous algorithms. Moreover, the SPICE simulation further verifies the correctness of the resulting clock tree.  相似文献   

3.
Exponentially tapered interconnect can reduce the dynamic power dissipation of clock distribution networks. A criterion for sizing H-tree clock networks is proposed. The technique reduces the power dissipated for an example clock network by up to 15% while preserving the signal transition times and propagation delays. Furthermore, the inductive behavior of the interconnects is reduced, decreasing the inductive noise. Exponentially tapered interconnects decrease by approximately 35% the difference between the overshoots in the signal at the input of a tree. As compared to a uniform tree with the same area overhead, overshoots in the signal waveform at the source of the tree are reduced by 40%.  相似文献   

4.
This paper presents a new methodology that implements a low swing clock tree. For low power IC design, low swing clock trees are one of the known techniques to lower the overall power dissipation through decreasing the power consumption of the clock network, while trading off the clock skew, local timing (slack) and the variation-tolerance (due to decreased noise margin). In this paper, an iterative skew minimization scheme for low swing clock trees is proposed via in-place buffer sizing considering multiple process corners. The proposed approach can preserve the power savings of the low swing clock tree implementation across multiple process corners. The effect of the decreased clock swing on the local timing is analyzed: The degradation in the timing slack is shown to be insignificant due to bounded clock slew eliminating most of the timing degradation on the clock network or the logic paths induced by decreased clock swing. The experimental results show that the proposed methodology can achieve an average of up to 11% power savings, with a skew degradation of less than 5% compared to the original full-swing clock tree, satisfying a practical skew budget. The proposed scheme is highly practical as it only performs in-place buffer sizing on the original clock tree.  相似文献   

5.
Clock skew variations adversely affect timing margins, limiting performance, reducing yield, and may also lead to functional faults. Non-tree clock distribution networks, such as meshes and crosslinks, are employed to reduce skew and also to mitigate skew variations. These networks, however, increase the dissipated power while consuming significant metal resources. Several methods have been proposed to trade off power and wires to reduce skew. In this paper, an efficient algorithm is presented to reduce clock skew variations while minimizing power dissipation and metal area overhead. With a combination of nonuniform meshes and unbuffered trees (UBT), a variation-tolerant hybrid clock distribution network is produced. Clock skew variations are selectively reduced based on circuit timing information generated by static timing analysis (STA). The skew variation reduction procedure is prioritized for critical timing paths, since these paths are more sensitive to skew variations. A framework for skew variation management is proposed. The algorithm has been implemented in a standard 65 nm cell library using standard EDA tools, and tested on several benchmark circuits. As compared to other nonuniform mesh construction methods that do not support managed skew tolerance, experimental results exhibit a 41% average reduction in metal area and a 43% average reduction in power dissipation. As compared to other methods that employ skew tolerance management techniques but do not use a hybrid clock topology, an 8% average reduction in metal area and a 9% average reduction in power dissipation are achieved.  相似文献   

6.
Clock mesh has been widely used to distribute the clock signal across the chip. Clock mesh is driven by a top-level tree and a set of mesh buffers. We present fast and efficient combinatorial algorithms to simultaneously identify the candidate locations as well as sizes of the buffers driving the clock mesh. We show that such a sizing offers a better solution than inserting buffers of uniform size across the mesh. Due to the high redundancy, a mesh architecture offers high tolerance toward variations in clock skew. However, such a redundancy comes at the expense of mesh wire length and power dissipation. Based on survivable network theory, we formulate the problem to reduce the clock mesh by retaining only those edges that are critical to maintain redundancy. Such a formulation offers designer the option to tradeoff between power and tolerance to process variations. We present efficient postprocessing techniques to reduce the size of the mesh buffers after mesh reduction. Experimental results indicate that our techniques can result in power savings up to 28% with less than 3.3% delay penalty. We also present driver models that can help in simulating the clock mesh. Such models achieve near-HSPICE accuracy with significant speedup in run time.   相似文献   

7.
以基于Cadence CCOPT引擎设计时钟树为例,介绍了以降低时钟树功耗为主要目的,使用门控技术,以及选择合适缓冲器、反相器构建时钟树的方法。通过完成物理设计动态仿真和功耗分析的数据表明,在保证时序收敛的前提下,使用门控技术和选用不同缓冲器、反向器对整个时钟树的功耗及性能影响进行分析。实验结构表明,对使用门控技术芯片的功耗在不同的操作条件下,整个时钟树上的功耗节省约50%;适合使用缓冲器和方向器构建时钟树。同时,在使用达到相同驱动的能力缓冲器和反相器情况下,使用缓冲器的时钟树较使用反相器的时钟树节省30%。  相似文献   

8.
This paper describes an interconnect technique for subthreshold circuits to improve global wire delay and reduce the delay variation due to process-voltage-temperature (PVT) fluctuations. By internally boosting the gate voltage of the driver transistors, operating region is shifted from subthreshold region to super-threshold region enhancing performance and improving tolerance to PVT variations. Simulations of a clock distribution network using the proposed driver shows a 66%-76% reduction in 3sigma clock skew value and 84%-88% reduction in clock tree delay compared to using conventional drivers. A 0.4-V test chip has been fabricated in a 0.18-mum 6-metal CMOS process to demonstrate the effectiveness of the proposed scheme. Measurement results show 2.6times faster switching speed and 2.4times less delay sensitivity under temperature variations.  相似文献   

9.
李春伟 《电子设计工程》2012,20(7):32-33,37
基于片上偏差对芯片性能的影响,分析对比了时钟树设计与时钟网格设计,重点分析了时钟网格抗OCV影响的优点,并利用实际电路应用两种方法分别进行设计对比,通过结果分析,验证了理论分析的正确性,证明在抗OCV及时序优化时钟网格方法具有很大的优势。  相似文献   

10.
In a typical clock distribution scheme, a central clock signal is distributed to several sites on the integrated circuit (IC). Local regenerators at these sites buffer the clock signal for the logic in regions close to the regenerator. Minimizing the skew between the clocks at these regeneration sites is critical. In recent times, this is becoming harder due to increasing intra-die processing variations. In this paper, we describe a novel technique to distribute a clock signal from a central location to several sites on a VLSI IC. Our technique uses a buffered H-tree and includes circuitry to dynamically remove any skew that may result due to intra-die processing variations. While existing approaches to deskewing a clock tree have utilized several phase detection circuits (number of phase detectors dependent on the number of clock regenerators), our method requires only one phase detector. Also, in our approach, the resolution of the phase detector is inconsequential unlike existing techniques. Our deskewing technique can be applied dynamically, either at boot time or periodically during the operation of the IC. Using a six-level H-tree clock distribution network with process variations deliberately included, we demonstrate that our technique can reduce skews as high as 300 ps down to just 3 ps. We compare our clock tree with traditional buffered and unbuffered H-tree networks.   相似文献   

11.
A fully integrated, phase-locked loop (PLL) clock generator/phase aligner for the POWER3 microprocessor has been designed using a 2.5-V, 0.40-μm digital CMOS6S process. The PLL design supports multiple integer and noninteger frequency multiplication factors for both the processor clock and an L2 cache clock. The fully differential delay-interpolating voltage-controlled oscillator (VCO) is tunable over a frequency range determined by programmable frequency limit settings, enhancing yield and application flexibility. PLL lock range for the maximum VCO frequency range settings is 340-612 MHz. The charge-pump current is programmable for additional control of the PLL loop dynamics. A differential on-chip loop filter with common-mode correction improves noise rejection. Cycle-cycle jitter measurements with the microprocessor actively executing instructions were 10.0 ps rms, 80 ps peak to peak (P-P) measured from the clock tree. Cycle-cycle jitter measured for the processor in a reset state with the clock tree active was 8.4 ps rms, 62 ps P-P. PLL area is 1040×640 μm2. Power dissipation is <100 mW  相似文献   

12.
In this work, we propose a clock skew-aware aging mitigation (CSAM) technique which considers the effect of asymmetric aging both on logic path and clock tree together. Simultaneous consideration of both parts in the design optimization problem enables us to reduce the area overhead while increasing the lifetime. For the aging mitigation of the logic path, we make use of both internal node control (INC) and input vector control (IVC) techniques while, for the clock tree circuits, a proper choice between NAND or NOR based integrated clock gating (ICG) cell is made. The optimization may be performed based on two objective functions of maximizing lifetime or minimizing the area overhead for a predetermined clock frequency and lifetime. To assess the efficacy of the proposed technique, we compared the lifetimes and area overheads for a set of circuits from ISCAS89 and ITC99 benchmark suites when CSAM and conventional techniques are used. The results, obtained using SPICE simulations for the circuits in a 45-nm technology, reveals that an average lifetime improvement of 34% and an average area overhead reduction of 25.7% for the two objective functions, respectively.  相似文献   

13.
The on-chip inductive impact on signal integrity has been a problem for designs in deep-submicrometer technologies. The inductive impact increases the clock skew, max timing, and noise of bus signals. In this letter, circuit simulations using silicon-validated macromodels show that there is a significant inductive impact on the signal max timing (/spl sim/ 10% pushout versus RC delay) and noise (/spl sim/2/spl times/RC noise). In nanometer technologies, process variations have become a concern. Results show that device and interconnect process variations add /spl sim/ 3% to the RLC max-timing impact. However, their impact on the RLC signal noise is not appreciable. Finally, inductive impact in 65- and 45-nm technologies is investigated, which indicates that the inductance impact will not diminish as technology scales.  相似文献   

14.
A low-swing clock double-edge triggered flip-flop (LSDFF) is developed to reduce power consumption significantly compared to conventional flip-flops. The LSDFF avoids unnecessary internal node transitions to reduce power consumption. In addition, power consumption in the clock tree is reduced because LSDFF uses a double-edge triggered operation as well as a low-swing clock. To prevent performance degradation of the LSDFF due to low-swing clock, low-Vt transistors are used for the clocked transistors without significant leakage current problems. The power saving in flip-flop operation is estimated to be 28.6% to 49.6% with additional 78% power saving in the clock network  相似文献   

15.
An integrated top-down design methodology is presented in this brief for synthesizing high performance clock distribution networks based on application dependent localized clock skew. The methodology is divided into four phases: (1) determining an optimal clock skew schedule composed of a set of nonzero clock skew values and the related minimum clock path delays; (2) designing the topology of the clock distribution network with delays assigned to each branch based on the circuit hierarchy, the aforementioned clock skew schedule, and minimizing process and environmental delay variations; (3) designing circuit structures to emulate the delay values assigned to the individual branches of the clock tree; and (4) designing the physical layout of the clock distribution network. The clock distribution network synthesis methodology is based on CMOS technology. The clock lines are transformed from distributed resistive capacitive interconnect lines into purely capacitive interconnect lines by partitioning the RC interconnect lines with inverting repeaters. Variations in process parameters are considered during the circuit design of the clock distribution network to guarantee a race-free circuit. Nominal errors of less than 2.5% for the delay of the clock paths and 7% for the clock skew between any two registers belonging to the same global data path as compared with SPICE Level-3 are demonstrated  相似文献   

16.
This paper describes substrate noise reduction techniques for synchronous CMOS circuits. Low-noise digital design techniques have been implemented and measured on a mixed-signal chip, fabricated in a 0.35 /spl mu/m CMOS process on an EPI-type substrate with 10 /spl Omega/cm EPI resistivity and 4 /spl mu/m EPI layer thickness. The test chip contains one reference design and two digital low-noise designs with the same basic architecture. Measurements show more than a factor of 2 on average in r.m.s. noise reduction with penalties of 3% in area and 4% in power for the low-noise design employing a supply-current waveform-shaping technique based on a clock tree with latencies. The second low-noise design employing separate substrate bias for both n- and p-wells, dual-supply, and on-chip decoupling achieves more than a factor of 2 reduction in r.m.s. noise, with, however, a 70% increase in area, but with a 5% decrease in power consumption.  相似文献   

17.
As technology advances into nanometer territory, clock network layout plays an increasingly important role in determining circuit quality indicated by timing, power consumption, cost, power supply noise and tolerance to process variation. To alleviate the challenges to the existing routing algorithms due to the continuous increase of the problem size and the high-performance requirement, X-architecture has been proposed and applied to routing in that it can reduce wirelength and via counts, and thus improves the performance and routability compared with the conventional Manhattan routing. In this paper, we investigate zero skew clock routing using X-architecture based on an improved greedy matching algorithm (GMZSTX). The fitted Elmore delay model is employed to improve the accuracy over the Elmore delay model. The interactions among distance, delay balance and load balance are analyzed. Based on this analysis, an effective and efficient greedy matching scheme is suggested to reduce wire snaking and to get a more balanced clock tree. The proposed algorithm is simple and fast for practical applications. Experimental results on benchmark circuits show that our algorithm (GMZSTX) achieves a reduction of 8.15% in total wirelength, 30.19% in delay and 55.31% in CPU time on average compared with zero skew clock routing in the Manhattan plane (BB+DME-2, which means using the top-down balanced bipartition (BB) method [T.H. Chao, Y.C. Hsu, J.M. Ho, et al., Zero skew routing with minimum wirelength, IEEE Trans. Circuits Syst. II—Analog & Digital Signal Process 39 (11) (1992) 799–814] to generate the tree topology and using the Deferred-Merge Embedding (DME) algorithm [T.H. Chao, Y.C. Hsu, J.M. Ho, et al., Zero skew routing with minimum wirelength, IEEE Trans. Circuits Syst. II—Analog & Digital Signal Process 39 (11) (1992) 799–814] to embed the internal nodes), and reduces delay and CPU time by 17.44% and 62.21% on average over the BB+DME-4 method (which is similar to BB+DME-2, but routing in X-architecture). Our SPICE simulation further verifies the correctness of the resulting clock tree.  相似文献   

18.
On-chip temperature gradient has emerged as a major design concern for high-performance integrated circuits for the current and future technology nodes. Clock skew is an undesirable phenomenon for synchronous digital circuits that is exacerbated by the temperature difference between various parts of the clock tree. The main aim of this paper is to provide intelligent solution for minimizing the temperature-dependent clock skew by designing dynamically adaptive circuit elements, particularly the clock buffers. Using an RLC model of the clock tree, we investigate the effect of on-chip temperature gradient on the clock skew for a number of temperature profiles that can arise in practice due to different architectures and applications. As an effective way of mitigating the variable clock skew, we present an adaptive circuit technique that senses the temperature of different parts of the clock tree and adjusts the driving strengths of the corresponding clock buffers dynamically to reduce the clock skew. Simulation results demonstrate that our adaptive technique is capable of reducing the skew by up to 92.4%, leading to much improved clock synchronization and design performance.   相似文献   

19.
徐毅  陈书明  刘祥远 《半导体学报》2011,32(9):095011-7
无缓冲谐振时钟分布网络能够最小化同步系统的时钟功耗。但由于没有缓冲器,时钟网络的偏斜受到多方面因素的影响,例如时钟互连线寄生参数的差异,非平衡时钟负载以及工艺、电压温度变化。本文提出了一种层次化的两相无缓冲谐振时钟互连网络结构,将网格型和树型结构的各自优点相结合。在TSMC 65nm标准CMOS工艺下,通过一个流水线乘法器电路分析了该结构时钟网络的偏斜及变化容忍特性。版图后仿真结果表明,层次化时钟网络的偏斜分别比纯网格和纯H树结构时钟网络降低超过75%和65%,而且在非平衡时钟负载或工艺、电压温度变化的情况下,时钟网络偏斜最高小于7ps,不超过整个时钟周期(约760ps)的1%。  相似文献   

20.
Conventional interconnections for digital clock distribution pose a severe power consumption problem for GHz clock distribution due to transmission line losses. Therefore, we have proposed an RF clock distribution (RCD) scheme for high-speed digital applications, in particular a multiprocessor system using global clocking. This paper first reports system power and signal integrity analysis results including skew, jitter, impedance mismatch, and noise for RF clock distribution,especially in the GHz range. Based on this analysis, a novel signal integrity design methodology for RF clock distribution systems is proposed. The clock skew created by process parameter variations are modeled and predicted. The system comprises a RF clock transmitter as a clock generator, an H-tree with junction couplers as a clock distributing network and a RF receiver as a digital clock-recovery module. Flip-chip interconnections for the chip-to-substrate assembly and 0.35 μm TSMC CMOS technology for the RF clock receiver are assumed. EMI analysis for 2 GHz 16-node-board-level RF clock distribution networks is conducted using 3D full-wave EM simulation. Finally, the RCD as a low power and high performance clocking method is demonstrated using HP's Advanced Design System (ADS) simulation, considering microwave frequency interconnection models and process parameter variations. In addition, test vehicles for both 2 GHz 16-node and 5 GHz 64-node board-level RF clock distribution networks were implemented and measured using thin, low-loss, and low permittivity RogersLt; RO3003 high-frequency organic substrate  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号