首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
在深亚微米设计中,降低能耗和传播延迟是片上全局总线所面对的两个最主要设计目标.本文提出了一种用于片上全局总线的时空编码方案,它既提高了性能又降低了峰值能耗和平均能耗.该编码方案利用空间总线倒相编码和时间编码电路技术的优点,在消除相邻连线上反相翻转的同时,减少了自翻转数和耦合翻转数.在应用该总线编码技术降低总线延时和能耗的设计中,给出了一种总线上插入中继驱动器的设计方法,以确定它们合适的尺寸和插入位置,使得在满足目标延时和翻转斜率要求的同时总线总的能耗最小.该方法可用来为各种编码技术获得翻转斜率约束下的总线能耗与延时的优化折中.  相似文献   

2.
在深亚微米设计中,降低能耗和传播延迟是片上全局总线所面对的两个最主要设计目标.本文提出了一种用于片上全局总线的时空编码方案,它既提高了性能又降低了峰值能耗和平均能耗.该编码方案利用空间总线倒相编码和时间编码电路技术的优点,在消除相邻连线上反相翻转的同时,减少了自翻转数和耦合翻转数.在应用该总线编码技术降低总线延时和能耗的设计中,给出了一种总线上插入中继驱动器的设计方法,以确定它们合适的尺寸和插入位置,使得在满足目标延时和翻转斜率要求的同时总线总的能耗最小.该方法可用来为各种编码技术获得翻转斜率约束下的总线能耗与延时的优化折中.  相似文献   

3.
In this paper, we propose a new circuit structure, the transition aware global signaling (TAGS) receiver, that detects transitions at arbitrary switch points. The major performance advantage of this circuit occurs when it switches before the 50% point in the input transition. The TAGS receiver stores the next state of the line while quiet. Upon detection of a transition at the end of the line the output is temporarily driven by the stored next state. Transitions at the output of the receiver are much faster than at the end of the line since they are generated locally. Its ability to detect transitions before a standard inverter and locally generate them at its output, allows its use at the end of long interconnects with fewer repeaters for the same delay as the standard repeater paradigm. The need for fewer repeaters with the TAGS scheme results in lower power consumption for on-chip global communication, while also reducing the placement overhead involved with large buffer blocks. This is shown in the context of bus optimizations, where TAGS achieves up to 50% reduction in power compared to standard repeaters. In an industrial 0.13-/spl mu/m CMOS process, TAGS receivers enable 8-mm-long buses at 1.5-GHz clock rates without repeaters, while the traditional scheme required three repeaters on the line. An extensive analysis of crosstalk noise in the bus environment shows that TAGS can handle the noise levels produced in typical bus structures. Also, the variation of delay in the bus structure under worst-case power supply noise for the TAGS scheme is typically smaller than the delay variation using the standard repeater scheme.  相似文献   

4.
Capacitive crosstalk between adjacent signal wires has significant effect on performance and delay uncertainty of point-to-point on-chip buses in deep submicrometer (DSM) VLSI technologies. We propose a hybrid polarity repeater insertion technique that combines inverting and non-inverting repeater insertion to achieve constant average effective coupling capacitance per wire transition for all possible switching patterns. Theoretical analysis shows the superiority of the proposed method in terms of performance and delay uncertainty compared to conventional and staggered repeater insertion methods. Simulations at the 90-nm node on semi-global METAL5 layer show around 25% reduction in worst case delay and around 86% delay uncertainty minimization compared to standard bus with optimal repeater configuration. The reduction in worst case capacitive coupling reduces peak energy which is a critical factor for thermal regulation and packaging. Isodelay comparisons with standard bus show that the proposed technique achieves considerable reduction in total buffers area, which in turn reduces average energy and peak current. Comparisons with staggered repeater which is one of the simplest and most effective crosstalk reduction techniques in the literature show that hybrid polarity repeater offers higher performance, less delay uncertainty, and reduced sensitivity to repeater placement variation.   相似文献   

5.
We propose various low-latency spatial encoder circuits based on bus-invert coding for reducing peak energy and current in on-chip buses with minimum penalty on total latency. The encoders are implemented in dual-rail domino logic with interfaces for static inputs and static buses. A spatial and temporally encoded dynamic bus technique is also proposed for higher performance targets. Comparisons to standard on-chip buses of various lengths with optimal repeater configurations at the 130-nm node show the energy-delay and peak current-delay design space in which the different encoder circuits are beneficial. A 9-mm spatially encoded static bus exhibits peak energy gains beyond that achievable through repeater optimization for a single-cycle operation at 1 GHz, with delay and energy overhead of the encoding included. For throughput-constrained buses, the spatially encoded static bus can provide up to 31% reduction in peak energy, while the spatially and temporally encoded dynamic bus yields peak current reductions of more than 50% for all bus lengths. The encoder circuits show good scaling properties since the performance penalty from encoding decreases with scaled interconnects.  相似文献   

6.
In a parallel multiwire structure, the exact spacing and size of the wires determine both the resistance and the distribution of the capacitance between the ground plane and the adjacent signal carrying conductors, and have a direct effect on the delay. Using closed-form equations that map the geometry to the wire parasitics and empirical switch factor based delay models that show how repeaters can be optimized to compensate for dynamic effects, we devise a method of analysis for optimizing throughput over a given metal area. This analysis is used to show that there is a clear optimum configuration for the wires which maximizes the total bandwidth. Additionally, closed form equations are derived, the roots of which give close to optimal solutions. It is shown that for wide buses, the optimal wire width and spacing are independent of the total width of the bus, allowing easy optimization of on-chip buses. Our analysis and results are valid for lossy interconnects as are typical of wires in submicron technologies.  相似文献   

7.
Optimal interconnection circuits for VLSI   总被引:3,自引:0,他引:3  
The propagation delay of interconnection lines is a major factor in determining the performance Of VLSI circuits because the RC time delay of these lines increases rapidly as chip size is increased and cross-sectional interconnection dimensions are reduced. In this paper, a model for interconnection time delay is developed that includes the effects of scaling transistor, interconnection, and chip dimensions. The delays of aluminum, WSi2, and polysilicon lines are compared, and propagation delays in future VLSI circuits are projected. Properly scaled multilevel conductors, repeaters, cascaded drivers, and cascaded driver/ repeater combinations are investigated as potential methods for reducing propagation delay. The model yields optimal cross-sectional interconnection dimensions and driver/repeater configurations that can lower propagation delays by more than an order of magnitude in MOSFET circuits.  相似文献   

8.
Efficient RC low-power bus encoding methods for crosstalk reduction   总被引:1,自引:0,他引:1  
In on-chip buses, the RC crosstalk effect leads to serious problems, such as wire propagation delay and dynamic power dissipation. This paper presents two efficient bus-coding methods. The proposed methods simultaneously reduce more dynamic power dissipation and wire propagation delay than existing bus encoding methods. Our methods also reduce more total power consumption than other encoding methods. Simulation results show that the proposed method I reduces coupling activity by 26.7-38.2% and switching activity by 3.7%-7% on 8-bit to 32-bit data buses, respectively. The proposed method II reduces coupling activity by 27.5-39.1% and switching activity by 5.3-9% on 8-bit to 32-bit data buses, respectively. Both the proposed methods reduce dynamic power by 23.9-35.3% on 8-bit to 32-bit data buses and total propagation delay by up to 30.7-44.6% on 32-bit data buses, and eliminate the Type-4 coupling. Our methods also reduce total power consumption by 23.6-33.9%, 23.9-34.3%, and 24.1-34.6% on 8-bit to 32-bit data buses with the 0.18, 0.13, and 0.09 μm technologies, respectively.  相似文献   

9.
A closed-form expression for the propagation delay of a CMOS gate driving a distributed RLC line is introduced that is within 5% of dynamic circuit simulations for a wide range of RLC loads. It is shown that the error in the propagation delay if inductance is neglected and the interconnect is treated as a distributed RC line can be over 35% for current on-chip interconnect. It is also shown that the traditional quadratic dependence of the propagation delay on the length of the interconnect for RC lines approaches a linear dependence as inductance effects increase. On-chip inductance is therefore expected to have a profound effect on traditional high-performance integrated circuit (IC) design methodologies. The closed-form delay model is applied to the problem of repeater insertion in RLC interconnect. Closed-form solutions are presented for inserting repeaters into RLC lines that are highly accurate with respect to numerical solutions. RC models can create errors of up to 30% in the total propagation delay of a repeater system as compared to the optimal delay if inductance is considered. The error between the RC and RLC models increases as the gate parasitic impedances decrease with technology scaling. Thus, the importance of inductance in high-performance very large scale integration (VLSI) design methodologies will increase as technologies scale  相似文献   

10.
The use of regenerative feedback repeaters to reduce the delay in programmable interconnections is described. A static, complementary regenerative feedback (CRF) repeater is proposed. This CRF repeater locally regenerates the new level for a fixed time after a transition has been detected. Design issues and limitations are discussed. It is shown that rising transitions can propagate faster than falling transitions through a chain of overdriven nMOS switches with CRF repeaters. Experimental results from a 1.2 μm CMOS implementation show that the loaded delay through 64 switches for static and dynamic repeaters can be reduced by a factor 1.4-2 over conventional repeaters  相似文献   

11.
/sup A/ new approach to handle inductance effects for multiple signal lines is presented. The worst-case switching pattern is first identified. Then a numerical approach is used to model the effective loop inductance (L/sub eff/) for multiple lines. Based on a look-up table for L/sub eff/, an equivalent single line model can be generated to decouple a specific signal line from the others to perform static timing analysis. Compared to the use of full RLC netlists for multiple lines, this approach greatly improves the computational efficiency and maintains accuracy for timing and signal integrity analysis. We apply these models to repeater insertion in critical paths and find that, for a single line, the RLC model minimizes delay with fewer number of repeaters than RC model. However, for multiple lines, we find that same number of repeaters is inserted for optimal delay according to both the RC and RLC models.  相似文献   

12.
Coding for system-on-chip networks: a unified framework   总被引:1,自引:0,他引:1  
Global buses in deep-submicron (DSM) system-on-chip designs consume significant amounts of power, have large propagation delays, and are susceptible to errors due to DSM noise. Coding schemes exist that tackle these problems individually. In this paper, we present a coding framework derived from a communication-theoretic view of a DSM bus to jointly address power, delay, and reliability. In this framework, the data is first passed through a nonlinear source coder that reduces self and coupling transition activity and imposes a constraint on the peak coupling transitions on the bus. Next, a linear error control coder adds redundancy to enable error detection and correction. The framework is employed to efficiently combine existing codes and to derive novel codes that span a wide range of tradeoffs between bus delay, codec latency, power, area, and reliability. Using simulation results in 0.13-/spl mu/m CMOS technology, we show that coding is a better alternative to repeater insertion for delay reduction as it reduces power dissipation at the same time. For a 10-mm 4-bit bus, we show that a bus employing the proposed codes achieves up to 2.17/spl times/ speed-up and 33% energy savings over a bus employing Hamming code. For a 10-mm 32-bit bus, we show that 1.7/spl times/ speed-up and 27% reduction in energy are achievable over an uncoded bus by employing low-swing signaling without any loss in reliability.  相似文献   

13.
14.
Interconnect plays an increasingly important role in deep-submicrometer very large scale integrated technologies. Multiple design criteria are considered in interconnect design, such as delay, power, and bandwidth. In this paper, a repeater insertion methodology is presented for achieving the minimum power in an RC interconnect while satisfying delay and bandwidth constraints. These constraints determine a design space for the number and size of the repeaters. The minimum power is shown to occur at the edge of the design space. With delay constraints, closed form solutions for the minimum power are developed, where the average error is 7% as compared with SPICE. With bandwidth constraints, the minimum power can be achieved with minimum-sized repeaters. The effects of inductance on the delay, bandwidth, and power of an RLC interconnect with repeaters are also analyzed. By including inductance, the minimum interconnect power under a delay or bandwidth constraint decreases as compared with an RC interconnect.  相似文献   

15.
We propose a method to estimate the data bus width to the requirements of an application that is to run on a custom processor. The proposed estimation method is a simulation-based tool that uses Extreme Value Theory to estimate the width of an off-chip or on-chip data bus based on the characteristics of the application. It finds the minimum number of bus lines needed for the bus connecting the custom processor to other units so that the probability of a multicycle data transfer on the bus is extremely unlikely. The potential target platforms include embedded systems where a custom processor (i.e. an ASIC or a FPGA) in a system-on-a-chip or a system-on-a-board is connected to memory, I/O and other processors through a shared bus or through point-to-point links. Our experimental and analytical results show that our estimation method can reduce the data bus width and cost by up to 66% with an average of 38% for nine benchmarks. The narrower data bus allows us to increase the spacing between the bus lines using the silicon area freed from the eliminated bus lines. This reduces interwire capacitance, which in turn leads to a significant reduction of bus energy consumption. Bus energy can potentially be reduced up to 89% for on-chip data buses with an average of 74% for seven benchmarks. Also, reduction in the interwire capacitance improves the bus propagation delay and on-chip bus propagation delay can be reduced up to 68% with an average of 51% for seven benchmarks using a narrower custom data bus.  相似文献   

16.
The technique of optimal voltage scaling and repeater insertion is analyzed in this paper to reduce power dissipation on global interconnects. An analytical model for the maximum bit-rate of a very large scale integration interconnect with repeaters has been derived and results are compared with HSPICE simulations. The analytical model is also used to study the effects of interconnect length and scaling on throughput. The throughput-per-bit-energy is analyzed to determine an optimum combination of supply voltage and repeaters for a low-power global interconnect with 250 nm /spl times/ 250 nm cross-sectional dimensions implemented with the 180 nm micro-optical silicon system technology node. It is shown that the optimal supply voltage is approximately equal to twice the threshold voltage. A case study illustrates that a combination of 1 V supply along with one repeater per millimeter increases the throughput-per-bit-energy to over three times that of a latency-centric interconnect of 2 V, which results in a 70% reduction in power dissipation without any loss of throughput performance.  相似文献   

17.
The use of deep-submicrometer (DSM) technology increases the capacitive coupling between adjacent wires leading to severe crosstalk noise, which causes power dissipation and may also lead to malfunction of a chip. In this paper, we present a technique that reduces crosstalk noise on instruction buses. While previous research focuses primarily on address buses, little work can be applied efficiently to instruction buses. This is due to the complex transition behavior of instruction streams. Based on instruction sequence profiling, we exploit an architecture that encodes pairs of bus wires and permute them in order to optimize power and noise. A close to optimal architecture configuration is obtained using a genetic algorithm. Unlike previous bus encoding approaches, crosstalk reduction can be balanced with delay and area overhead. Moreover, if delay (or area) is most critical, our architecture can be tailored to add nearly no overhead to the design. For our experiments, we used instruction bus traces obtained from 12 SPEC2000 benchmark programs. The results show that our approach can reduce crosstalk up to 50.79% and power consumption up to 55% on instruction buses.  相似文献   

18.
Thermal-Aware Methodology for Repeater Insertion in Low-Power VLSI Circuits   总被引:1,自引:0,他引:1  
In this paper, the impact of thermal effects on low-power repeater insertion methodology is studied. An analytical methodology for thermal-aware repeater insertion that includes the electrothermal coupling between power, delay, and temperature is presented, and simulation results with global interconnect repeaters are discussed for 90- and 65-nm technology. Simulation results show that the proposed thermal-aware methodology can save 17.5% more power consumed by the repeaters compared to a thermal-unaware methodology for a given allowed delay penalty. In addition, the proposed methodology also results in a lower chip temperature, and thus, extra leakage power savings from other logic blocks.  相似文献   

19.
Crosstalk, propagation delay, pulse distortion in multiconductor buses for high-speed GaAs logic circuits are analyzed. A simple but accurate quasi-TEM model of the bus is developed, and a critical analysis is carried out both on the accuracy of different approximate lumped and distributed models and on the impact of such approximations on the time-domain response. Results on the behavior of multiconductor buses in the presence of realistic input waveforms, are presented and design criteria are obtained  相似文献   

20.
一种基于目标延迟约束缓冲器插入的互连优化模型   总被引:1,自引:1,他引:0  
基于分布式RLC传输线,提出在互连延迟满足目标延迟的条件下,利用拉格朗日函数改变插入缓冲器数目与尺寸来减小互连功耗和面积的优化模型. 在65nm CMOS工艺下,对两组不同类型的互连线进行计算比较,验证该模型在改善互连功耗与面积方面的优点. 此模型更适合全局互连线的优化,而且互连线越长,优化效果越明显,能够应用于纳米级SOC的计算机辅助设计和集成电路优化设计.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号