首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
As technology advances into nanometer territory, clock network layout plays an increasingly important role in determining circuit quality indicated by timing, power consumption, cost, power supply noise and tolerance to process variation. To alleviate the challenges to the existing routing algorithms due to the continuous increase of the problem size and the high-performance requirement, X-architecture has been proposed and applied to routing in that it can reduce wirelength and via counts, and thus improves the performance and routability compared with the conventional Manhattan routing. In this paper, we investigate zero skew clock routing using X-architecture based on an improved greedy matching algorithm (GMZSTX). The fitted Elmore delay model is employed to improve the accuracy over the Elmore delay model. The interactions among distance, delay balance and load balance are analyzed. Based on this analysis, an effective and efficient greedy matching scheme is suggested to reduce wire snaking and to get a more balanced clock tree. The proposed algorithm is simple and fast for practical applications. Experimental results on benchmark circuits show that our algorithm (GMZSTX) achieves a reduction of 8.15% in total wirelength, 30.19% in delay and 55.31% in CPU time on average compared with zero skew clock routing in the Manhattan plane (BB+DME-2, which means using the top-down balanced bipartition (BB) method [T.H. Chao, Y.C. Hsu, J.M. Ho, et al., Zero skew routing with minimum wirelength, IEEE Trans. Circuits Syst. II—Analog & Digital Signal Process 39 (11) (1992) 799–814] to generate the tree topology and using the Deferred-Merge Embedding (DME) algorithm [T.H. Chao, Y.C. Hsu, J.M. Ho, et al., Zero skew routing with minimum wirelength, IEEE Trans. Circuits Syst. II—Analog & Digital Signal Process 39 (11) (1992) 799–814] to embed the internal nodes), and reduces delay and CPU time by 17.44% and 62.21% on average over the BB+DME-4 method (which is similar to BB+DME-2, but routing in X-architecture). Our SPICE simulation further verifies the correctness of the resulting clock tree.  相似文献   

2.
As technology advances into the nanometer territory, the interconnect delay has become a first-order effect on chip performance. To handle this effect, the X-architecture has been proposed for high-performance integrated circuits. In this paper, we present a performance-driven X-architecture router based on a novel multilevel framework, called PIXAR. To fully consider performance-driven routing and take advantage of the X-architecture, PIXAR applies a novel multilevel routing framework, which adopts a two-stage technique of top-down uncoarsening followed by bottom-up coarsening, with a trapezoid-shaped track routing embedded between the two stages to assign long, straight diagonal segments for wirelength reduction. We also propose a performance-driven X-Steiner tree algorithm based on the delaunay triangulations to construct routing tree for performance optimization. Compared with the state-of-the-art work, PIXAR achieves 100% routing completion for all circuits while reduced the net delay.  相似文献   

3.
Many methodologies for clock mesh networks have been introduced for two‐dimensional integrated circuit clock distribution networks, such as methods to reduce the total wirelength for power consumption and to reduce the clock skew variation through consideration of buffer placement and sizing. In this paper, we present a methodology for clock mesh to reduce both the clock skew and the total wirelength in three‐dimensional integrated circuits. To reduce the total wirelength, we construct a smaller mesh size on a die where the clock source is not directly connected. We also insert through‐silicon vias (TSVs) to distribute the clock signal using an effective clock TSV insertion algorithm, which can reduce the total wirelength on each die. The results of our proposed methods show that the total wirelength was reduced by 12.2%, the clock skew by 16.11%, and the clock skew variation by 11.74%, on average. These advantages are possible through increasing the buffer area by 2.49% on the benchmark circuits.  相似文献   

4.
In ultra-deep submicron very large-scale integration (VLSI) designs, clock network layout plays an increasingly important role on determining circuit quality indicated by timing, power consumption, cost, power-supply noise, and tolerance to process variations. In this brief, a new merging scheme is proposed for prescribed nonzero skew routings which are useful in reducing clock cycle time, suppressing power-supply noise, and improving tolerance to process variations. This technique is simple and easy to implement for practical applications. Experimental results on benchmark circuits with both buffered and unbuffered routings exhibit large improvement on wirelength and buffer cost compared with other existing works.  相似文献   

5.
Antenna effect is a phenomenon in the plasma-based nanometer process and directly influences the manufacturing yield of VLSI circuits. Because antenna-critical metal wires have sufficient charges to damage the thin gate oxides of the clock input ports connected by a clock tree, the standard cells or IPs cannot be driven by the clock source synchronously. For a given X-architecture clock tree that connects n clock sinks, we consider the antenna effect in the clock tree and propose a discharge-path-based antenna effect detection method. To fix the antenna violations, we use the jumper insertion technique recommended by foundries. Furthermore, we integrate the layer assignment technique to reduce the inserted jumper and via counts. Differing from the existing works, the delay of vias is considered in delay calculation, and a wire sizing technique is applied for clock skew compensation after fixing the antenna violations. Experimental results on benchmarks show that our algorithm runs in O(n2) to averagely insert 48.21% less jumpers and reduce 20.35% in vias compared with other previous algorithms. Moreover, the SPICE simulation further verifies the correctness of the resulting clock tree.  相似文献   

6.
协作虚拟多输入多输出(VMIMO)传输是一种有效的无线传输性能优化技术。将物理层协作VMIMO技术和网络层路由选择技术相结合,设计跨层VMIMO路由选择方案可以利用VMIMO的分集增益,显著地降低网络传输能耗。如何设计VMIMO协作路由协议抵抗无线网络的自私节点和欺骗行为,保证高数据转发率和低传输能耗成为路由设计中的重大挑战。为了提高自私网络的VMIMO路由性能,提出了一种基于重复路由博弈的VMIMO协作路由算法。该算法将网络划分成多个Group、Group间使用VMIMO传输数据。将Group间路由选择过程建模为重复路由博弈过程。为了提高数据转发的成功率,提出适用度函数评估节点参与数据分组转发的信誉。以此为基础,提出基于适用度的路由选择子算法和路由转发子算法。理论证明所提重复路由博弈可达到帕累托最优。仿真实验结果表明本算法可以促进自私节点相互合作,可获得较高的数据转发率,较好地减少数据传输时延以及能量消耗。  相似文献   

7.
This article introduces a novel lookup table (LUT) and its usage in the configurable logic block (CLB) architectures for SRAM-based field-programmable gate array (FPGA) architectures. The proposed CLB allows sharing of SRAM tables of LUTs among NPN-equivalent functions to reduce the size of memories used for storing the functions and also reduces the number of configuration bits required. We measured many different characteristics of FPGAs using our new CLB architecture, including area, delay, routing, and power requirements. We experimentally found that for many different FPGA architectures, CLBs can share one-fourth of their SRAM tables between two basic logic elements (BLEs), which reduced both power consumption and area without negatively affecting routing or wirelength, and there was only a negligible increase in critical path delay of 0.27%. Specifically, we find that FPGAs consisting of CLBs with 16 BLEs and 34 inputs can be implemented with eight normal SRAMs and four SRAMs shared between two BLEs, for an overall reduction of four out of sixteen SRAM tables per CLB. With this new CLB architecture, we measured an approximate reduction in overall power consumption of 2% and an estimated reduction in area of 3%  相似文献   

8.
孙骥  毛军发  李晓春 《微电子学》2005,35(3):293-296
特定的非零偏差时钟网比零偏差时钟网更具优势,它有助于提高时钟频率、降低偏差的敏感度.文章提出了一种新的非零偏差时钟树布线算法,它结合时钟节点延时和时钟汇点位置,得到一个最大节点延时次序合并策略,使时钟树连线长度变小.实验结果显示,这种算法与典型的最邻近选择合并策略相比较,可以减少20%~30%的总连线长度.  相似文献   

9.
波束形成是声纳探测系统中探测目标的主要技术手段,在现有设备中,主要采用DSP来实现。在用DSP实现波束形成算法的过程中,由于DSP本身的顺序执行架构,如果采用单片DSP处理,从输入信号到输出结果之间存在非常大的时间延迟,采用5片DSP处理则功耗增加为5倍,时延200 ms。采用FPGA,通过设计并行运算的程序结构来实现波束形成算法可以大大缩短算法实现的时间延迟,功耗也可以降低为采用DSP的1/10。设计的波束形成器采用100 MHz时钟,相比采用5片DSP,运算时间由200 ms缩短到10 ms左右,功耗降低为后者的1/5。  相似文献   

10.
In this brief, we propose a new physical design technique for a subquarter micrometer system-on-a-chip (SoC). By optimizing the individual layer's routing grid space, coupling effects such as crosstalk noise, crosstalk-induced delay variations, and coupling power consumption are almost eliminated with little runtime penalty. Experiments are performed on the design of an image processing circuit using a subquarter micron CMOS process with multilayer interconnects. Simply by employing our proposed technique, the maximum delay and the power consumption can be decreased simultaneously by up to 15% and 10%, respectively, without any other process improvements.  相似文献   

11.
合理偏差驱动的时钟线网构造及优化   总被引:1,自引:0,他引:1  
提出了一种新的时钟布线算法 ,它综合了 top- down和 bottom- up两种时钟树拓扑产生方法 ,以最小时钟延时和总线长为目标 ,并把合理偏差应用到时钟树的构造中 .电路测试结果证明 ,与零偏差算法比较 ,该算法有效地减小了时钟树的总体线长 ,并且优化了时钟树的性能  相似文献   

12.
带偏差约束的时钟线网的拓扑构造和优化   总被引:1,自引:0,他引:1  
刘毅  洪先龙  蔡懿慈 《半导体学报》2002,23(11):1228-1232
提出了一种新的拓扑构造和优化方法,综合考虑了几种拓扑构造方法的优点,总体考虑偏差约束,局部进行线长优化.实验结果表明,它可以有效控制节点之间的偏差,同时保证减小时钟布线树的整体线长.  相似文献   

13.
The Y architecture has recently received much attention due to its many potential advantages, such as substantially reduced wirelength, power consumption and significantly improved throughput. To fully utilize the virtues of Y architecture, several hexagon/triangle placement (HTP) algorithms suitable for the Y architecture were presented, however the wirelength optimization is not included in the algorithms. Wirelength estimation is fundamental to guide the wirelength optimization process in early design stages. In this paper, we present an accurate and efficient wirelength estimation technique called APWL-Y appropriate for the Y architecture, and especially for HTP floorplanner and placer. The average error of APWL-Y is 4.41% for 1.57 million nets from industrial circuits. When developing APWL-Y, we find out that 3-SMT wirelength is a power function of aspect ratio of bounding box of the given n-pin nets. The time complexity of APWL-Y is O(n). APWL-Y is very effective to guide the wirelength optimization in a HTP placer. Moreover, we develop an efficient HTP algorithm with wirelength optimization driven by APWL-Y estimator. The placement results by our placer subject to different optimization objectives are presented. Compared to the HTP placer with only area optimization, our placer can reduce the wirelength by 54.3% with a small area overhead of 9.07% on average. In addition, we explore the HPWL technique in the Y architecture. To the best of our knowledge, this paper is the first in-depth study on wirelength estimation technique in Y architecture and HTP floorplanning optimization with consideration of interconnects.  相似文献   

14.
Block-Interlaced LDPC Decoders With Reduced Interconnect Complexity   总被引:1,自引:0,他引:1  
Two design techniques are proposed for high-throughput low-density parity-check (LDPC) decoders. A broadcasting technique mitigates routing congestion by reducing the total global wirelength. An interlacing technique increases the decoder throughput by processing two consecutive frames simultaneously. The brief discusses how these techniques can be used for both fully parallel and partially parallel LDPC decoders. For fully parallel decoders with code lengths in the range of a few thousand bits, the half-broadcasting technique reduces the total global wirelength by about 26% without any hardware overhead. The block interlacing scheme is applied to the design of two fully parallel decoders, increasing the throughput by 60% and 71% at the cost of 5.5% and 9.5% gate count overhead, respectively.  相似文献   

15.
Power-gating-aware design has been an active area of research in the last decade, aiming at reducing power dissipation while meeting a desired system throughput. In this study, an algorithm integrating both scheduling and binding processes is developed with the functional unit (FU) power-gating technique, to achieve maximum leakage energy reduction under both performance and resource constraints. Firstly, the possible leakage energy reductions of all idle intervals are analyzed by evaluating the operation mobilities. Secondly, a split network indicating the leakage energy reduction in each idle interval is constructed, and a min-cost flow-based algorithm is conducted to this network to evaluate the total leakage energy saving from power-gating FUs; operations are scheduled to the clock cycles and bound to FUs with a maximization of leakage energy saving. Finally, proper FUs are clustered under power domain constraints to maximize the leakage energy saving while reducing the area and wirelength penalties for fine grain power-gating. Experimental results show the effectiveness of our proposed algorithms in saving leakage energy.  相似文献   

16.
Dynamic voltage scaling has been widely acknowledged as a powerful technique for trading off power consumption and delay for processors. Recently, variable-frequency (and variable-voltage) parallel and serial links have also been proposed, which can save link power consumption by exploiting variations in the bandwidth requirement. This provides a new dimension for power optimization in a distributed embedded system connected by a voltage-scalable interconnection network. At the same time, it imposes new challenges for variable-voltage scheduling as well as flow control. First, the variable-voltage scheduling algorithm should be able to trade off the power consumption and delay jointly for both processors and links. Second, for the variable-frequency network, the scheduling algorithm should not only consider the real-time constraints, but should also be consistent with the underlying flow control techniques. In this paper, we address joint dynamic voltage scaling for variable-voltage processors and communication links in such systems. We propose a scheduling algorithm for real-time applications that captures both data flow and control flow information. It performs efficient routing of communication events through multihops, as well as efficient slack allocation among heterogeneous processors and communication links to maximize energy savings, while meeting all real-time constraints. Our experimental study shows that on an average, joint voltage scaling on processors and links can achieve 32% less power compared with voltage scaling on processors alone  相似文献   

17.
李芝燕  严晓浪 《微电子学》1999,29(3):164-168
针对时钟布线提出了一种有效的变线宽算法。该算法通过对时钟树中各树枝延迟敏感度的分析,选择总体最优的连线进行变线宽处理,使得时钟树的路径延迟最小化。在延迟优化后,为了使时钟偏差小于给定的约束,通过变线宽对各种钟汇点的延迟进行全面的再分配,使延迟最大的时钟汇点延迟最小化,而延迟较小的路径延迟适当增加,以进一步改善时钟树延迟。实验结果表明,该算法有较高的运行效率,时钟树的路径路径和时钟偏差得到了显著的改  相似文献   

18.
Clock gating is an effective way to reduce the dynamic power in digital sequential circuits. In this paper, a gate-level activity correlation-based clustering clock-gating (CCG) technique is proposed for digital filters. The CCG technique exploits the correlations between flip-flops, and determines how to group the flip-flops for clock gating. An Activity Correlation Matrix (ACMtx) is introduced to describe the correlations between the flip-flops, and a greedy clustering algorithm is proposed to find an optimised clustering scheme as well. Experiments on ISCAS’89 benchmarks show that the proposed technique can reduce power consumption by 5.08% on average, on top of existing technique. For the circuits with large numbers of flip-flops, our proposed technique can save 15.84% more power on average.  相似文献   

19.
Power loss and interference coexist in wireless transmissions where random uncertainty is aggravated due to the mobility of sensor nodes. A probability interference model was proposed, based on the physical model and random fading of the received signal power, to depict the uncertainty of wireless interference. In addition, an interference-aware routing metric was designed, in which interference, routing convergence and residual energies of nodes were integrated. Furthermore, an interference-aware probabilistic routing algorithm was proposed for mobile wireless sensor networks, and its correctness and time and space complexities were proved. The NS-2 simulation experiments showed that the proposed algorithm can achieve higher packet delivery ratio than Greedy Perimeter Stateless Routing in various cases like the pause time and maximum moving speed. Simultaneously, the energy consumption of a packet and average delay were taken into consideration to better meet the needs of mobile scenarios with higher reliability.  相似文献   

20.
We propose a simulated annealing based zero-skew clock net construction algorithm that works in any routing spaces, from Manhattan to Euclidean, with the added flexibility of optimizing either the wire length or the propagation delay. We first devise an O(log n) tree grafting perturbation function to construct a zero-skew clock tree under the Elmore delay model. This tree grafting scheme is able to explore the entire solution space asymptotically. A Gauss-Seidel iteration procedure is then applied to optimize the Steiner point positions. Experimental results have shown that our algorithm can achieve substantial delay reduction and encouraging wire length minimization compared to previous works  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号