首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Short-range parallel optical interconnect between integrated circuits can alleviate bandwidth, power, and packaging density issues that are associated with low-latency high-bandwidth input-output over electrical interconnect. In this paper, we evaluate the option of using true source-synchronous signaling over optical interconnect with a large number of channels, reducing the substantial per-channel clock synchronization circuitry to one instance. We also look into dc-unbalanced signaling to remove the need for data coding. Uniformity across channels is key to the feasibility of such an approach. An actual 64-channel parallel optical interconnect setup at 1.25 Gb/s/channel is examined, and models for the performance and uniformity of the different constituent parts of the interconnect are drawn up. Major attention is given to the statistical modeling of the coupling efficiency between a vertical cavity surface emitting laser array and a multifiber connector. Although derived in the context of a uniformity study, the stochastic models and the modeling approach are valuable in their own right. In our case study, the usage of a common logic threshold across all channels, which is required for dc-unbalanced signaling, appears infeasible after all models are combined. Efficient true source-synchronous signaling turns out to be within reach in carefully designed systems.  相似文献   

2.
The on-chip global interconnect with conventional Cu/low-k and delay-optimized repeater scheme faces great challenges in the nanometer regime owing to its severe performance degradation. This paper describes the analytical models and performance comparisons of novel interconnect technologies and circuit architectures to cope with the interconnect performance bottlenecks. Carbon nanotubes (CNTs) and optics-based interconnects exhibit promising physical properties for replacing the current Cu/low-k-based global interconnects. We quantify the performance of these novel interconnects and compare them with Cu/low-k wires for future high-performance integrated circuits. The foregoing trends are studied with technology node and bandwidth density in terms of latency and power dissipation. Optical wires have the lowest latency and power consumption, whereas a CNT bundle has a lower latency than Cu. The new circuit scheme, i.e., “capacitively driven low-swing interconnect (CDLSI),” has the potential to effect a significant energy saving and latency reduction. We present an accurate analytical optimization model for the CDLSI wire scheme. In addition, we quantify and compare the delay and energy expenditure for not only the different interconnect circuit schemes but also the various future technologies, such as Cu, CNT, and optics. We find that the CDLSI circuit scheme outperforms the conventional interconnects in latency and energy per bit for a lower bandwidth requirement, whereas these advantages degrade for higher bandwidth requirements. Finally, we explore the impact of the CNT bundle and the CDLSI on a via blockage factor. The CNT shows a significant reduction in via blockage, whereas the CDLSI does not help to alleviate it, although the CDLSI results in a reduced number of repeaters due to the differential signaling scheme.   相似文献   

3.
A source-synchronous I/O link with adaptive receiver-side equalization has been implemented in 0.13-/spl mu/m bulk CMOS technology. The transceiver is optimized for small area (360 /spl mu/m /spl times/ 360 /spl mu/m) and low power (280 mW). The analog equalizer is implemented as an 8-way interleaved, 4-tap discrete-time linear filter. The equalization improved the data rate of a 102 cm backplane interconnect by 110%. On-die adaptive logic determines optimal receiver settings through comparator offset cancellation, data alignment of the transmitter and receiver, clock de-skew and setting filter coefficients for equalization. The noise-margin degradation due to statistical variation in converged coefficient values was less than 3%.  相似文献   

4.
This paper presents a comparative performance analysis to investigate the impact of aging mechanisms on various flip-flops in CMOS and FinFET technologies. We consider Bias Temperature Instability (BTI) and Hot Carrier Injection (HCI) effects on the robustness of high performance flip-flops. To apply BTI and HCI aging mechanisms, we utilize long-term model to estimate ∆ Vth and employ the updated Vth in transistor model file. The simulation results on performance analysis indicate the high ranking of various flip-flops considering speed and power consumption in each CMOS and FinFET technologies, moreover, approve the superiority of static FinFET flip-flops over CMOS flip-flops. In addition, a comparative analysis considering temperature and VDD variations over different FinFET flip-flop structures demonstrates the average percentages of TDQmin and PDP degradation against aging mechanisms are significantly less than similar CMOS flip-flops.  相似文献   

5.
Networks on chips (NoCs) are becoming popular as they provide a solution for the interconnection problems on large integrated circuits (ICs). But even in a NoC, link-power can become unacceptably high and data rates are limited when conventional data transceivers are used. In this paper, we present a low-power, high-speed source-synchronous link transceiver which enables a factor 3.3 reduction in link power together with an 80% increase in data-rate. A low-swing capacitive pre-emphasis transmitter in combination with a double-tail sense-amplifier enable speeds in excess of 9 Gb/s over a 2 mm twisted differential interconnect, while consuming only 130 fJ/transition without the need for an additional supply. Multiple transceivers can be connected back-to-back to create a source-synchronous transceiver-chain with a wave-pipelined clock, operating with $6sigma$ offset reliability at 5 Gb/s.   相似文献   

6.
This paper presents the design of the ItaniumTM Processors system bus interface achieving a peak data bandwidth of 2.1 GB/s in a glueless four-way multiprocessing system. A source-synchronous data bus with differential strobes enables this high bandwidth. Topics covered in this paper include optimization technique for the system topology, CPU package, signaling protocol, and I/O circuits. Highly accurate modeling and validation methodologies enable a good correlation of experimental results with simulation data  相似文献   

7.
Integration of partial scan and built-in self-test   总被引:2,自引:0,他引:2  
Partial-Scan based Built-In Self-Test (PSBIST) is a versatile Design for Testability (DFT) scheme, which employs pseudo-random BIST at all levels of test to achieve fault coverages greater than 98% on average, and supports deterministic partial scan at the IC level to achieve nearly 100% fault coverage. PSBIST builds its BIST capability on top a partial scan structure by adding a test pattern generator, an output data compactor, and a PSBIST controller in a way similar to that of deriving a full scan BIST from a full scan structure. However, to make the scheme effective, there is a minimum requirement regarding which flip-flops in the circuit should be replaced by scan flip-flops and/or initialization flip-flops. In addition, test arents are usually added to boost the fault coverage to the range of 95 to 100 percent. These test points are selected based on a novel probabilistic testability measure, which can be computed extremely fast for a special class of circuits. This ciass of circuits is precisely the type of circuits that we obtain after replacing some of the flip-flops.withscan and/or initilization flip-flops. The testability measure is also used for a very useful quick estimation of the fault coverage right after the selection of sean flip-flops, even before the circuit is modified to incorporate PSBIST capability. While PSBIST provides all the benefits of BIST, it incurs lower area overhead and performance degradation than full scan. The area overhead is further reduced when the boundary scan cells are reconfigured for BIST usage.  相似文献   

8.
This paper proposes a bus architecture which improves the performance and/or power dissipation of online buses. The proposed architecture reduces the delay on alternate lines by lowering the threshold voltage of its devices. Furthermore, the shifting of the signal switching on adjacent lines reduces the worst case coupling capacitance. Two implementations of this bus architecture are proposed, the alternate-$V_t$and the alternate forward body biased schemes, and are compared to a conventional bus scheme. For a flop distance of 1800$mu$m, the proposed schemes use the gained delay slack to reduce the total device width, and thus reducing the energy dissipation by 31.2%. For a 500-ps cycle time, the proposed bus schemes increase the maximum distance between flip-flops by 33%.  相似文献   

9.
High reliability against noise, high performance, and low energy consumption are key objectives in the design of on-chip networks. Recently some researchers have considered the impact of various error-control schemes on these objectives and on the tradeoff between them. In all these works performance and reliability are measured separately. However, we will argue in this paper that the use of error-control schemes in on-chip networks results in degradable systems, hence, performance and reliability must be measured jointly using a unified measure, i.e., performability. Based on the traditional concept of performability, we provide a definition for the “Interconnect Performability”. Analytical models are developed for interconnect performability and expected energy consumption. A detailed comparative analysis of the error-control schemes using the performability analytical models and SPICE simulations is provided taking into consideration voltage swing variations (used to reduce interconnect energy consumption) and variations in wire length. Furthermore, the impact of noise power and time constraint on the effectiveness of error-control schemes are analyzed.   相似文献   

10.
In system-on-chips (SOCs), a nonnegligible part of operation time is spent on global wires with long delays. Retiming-that is moving flip-flops in a circuit without changing its functionality-can be explored to pipeline long interconnect wires in SOC designs. The problem of retiming over a netlist of macro-blocks, where the internal structures may not be changed and flip-flops may not be inserted on some wire segments is called the wire retiming problem. In this paper, we formulate the constraints of the wire retiming problem as a fixpoint computation and use an iterative algorithm to solve it. Experimental results show that this approach is multiple orders more efficient than the previous one.  相似文献   

11.
As technology scales, the shrinking wire width increases the interconnect resistivity, while the decreasing interconnect spacing significantly increases the coupling capacitance. This paper proposes reducing the number of bus lines of the conventional parallel-line bus (PLB) architecture by multiplexing each m-bits onto a single line. This bus architecture, the serial-link bus (SLB), transforms an n-bit conventional PLB into an n/m-line (serial link) bus. The advantage of SLBs is that they have fewer lines, and if the bus width is kept the same, SLBs will have a larger line pitch. Increasing the line width has a twofold reduction effect on the line resistance; as the resistivity of sub-100 nm wires drops significantly, the line width increases. Also, increasing the line width and spacing reduces the coupling capacitance between adjacent lines, but increases the line-to-ground capacitance. Thus, an optimum degree of multiplexing m opt and an optimum width to pitch ratio etaopt exist, which minimizes the bus energy dissipation and maximizes the bus throughput per unit area. The optimum degree of multiplexing and optimum width-to-pitch ratio for maximum throughput per unit area and minimum energy dissipation for the 25-130-nm technologies was determined in this paper. Also, an encoding technique was proposed and implemented to reduce the switch activity penalty due to serialization. HSPICE simulations show that for the same throughput per unit area as conventional parallel-line data buses, the SLB architecture reduces the energy dissipation by up to 31% for a 64-bit bus implemented in an intermediate metal layer of a 50-nm technology, and a reduction of 53% is projected for a 25-nm technology.  相似文献   

12.
Market forces are continually demanding devices with increased functionality/unit area; these demands have been satisfied through aggressive technology scaling which, unfortunately, has impacted adversely on the global interconnect delay subsequently reducing system performance. Line drivers have been used to mitigate the problems with delay; however, these have large power consumption. A solution to reducing the power dissipation of the drivers is to use lower supply voltages. However, by adopting a lower power supply voltage, the performance of the line drivers for global interconnects is impaired unless low-swing signalling techniques are implemented. The paper describes the design of a low-swing signalling scheme which consists of a low-swing driver, called the nLVSD driver which is an improved version of the MJ-driver [1] designed by Juan A. Montiel-Nelson and Jose C. Garcia. Subsequently, both low-swing driver schemes are analysed and compared focusing on their power consumption and performance characteristics, which are the main issues in present day IC design. A comparison between the two driver schemes showed that the nLVSD driver exhibited a 34% improvement regarding power consumption and a 28% improvement in delay when driving a 10 mm length of interconnect. A comparison between the two schemes was also undertaken in the presence of ±3σ Process and Voltage (PV) variations. The analysis indicated that the nLVSD driver scheme was more robust than the MJ-driver with a 33% and 44% improvement with respect to power consumption and delay variations. In order to further improve the robustness of the nLVSD scheme against process variation, the scheme was further analysed to identify which process variables had the most impact on circuit delay and power consumption. For completeness the effects of process variation on interconnect delay and power consumption was also undertaken.  相似文献   

13.
System performance can be improved by employing scheduled skews at flip-flops. This optimization technique is called skewed-clock optimization and has been successfully used in memory designs to achieve high operating frequencies. There are two important issues in developing this optimization technique. The first is the selection of appropriate clock skews to improve system performance. The second is to reliably distribute skewed clocks in the presence of manufacturing and environmental variations. Without the careful selection of clocking times and control of unintentional clock skews, potential system performance might not be achieved. In this paper a theoretical framework is first presented for solving the problem of optimally scheduling skews. A novel self-calibrating clock distribution scheme is then developed which can automatically track variations and minimize unintentional skews. Clocks with proper skews can be reliably delivered by such a scheme.  相似文献   

14.
Ultra Low-Power Clocking Scheme Using Energy Recovery and Clock Gating   总被引:1,自引:0,他引:1  
A significant fraction of the total power in highly synchronous systems is dissipated over clock networks. Hence, low-power clocking schemes are promising approaches for low-power design. We propose four novel energy recovery clocked flip-flops that enable energy recovery from the clock network, resulting in significant energy savings. The proposed flip-flops operate with a single-phase sinusoidal clock, which can be generated with high efficiency. In the TSMC 0.25-$mu$m CMOS technology, we implemented 1024 proposed energy recovery clocked flip-flops through an H-tree clock network driven by a resonant clock-generator to generate a sinusoidal clock. Simulation results show a power reduction of 90% on the clock-tree and total power savings of up to 83% as compared to the same implementation using the conventional square-wave clocking scheme and flip-flops. Using a sinusoidal clock signal for energy recovery prevents application of existing clock gating solutions. In this paper, we also propose clock gating solutions for energy recovery clocking. Applying our clock gating to the energy recovery clocked flip-flops reduces their power by more than 1000 $times$ in the idle mode with negligible power and delay overhead in the active mode. Finally, a test chip containing two pipelined multipliers one designed with conventional square wave clocked flip-flops and the other one with the proposed energy recovery clocked flip-flops is fabricated and measured. Based on measurement results, the energy recovery clocking scheme and flip-flops show a power reduction of 71% on the clock-tree and 39% on flip-flops, resulting in an overall power savings of 25% for the multiplier chip.   相似文献   

15.
In this paper, we present a novel method for statistical inductance extraction and modeling for interconnects considering process variations. The new method, called statHenry, is based on the collocation-based spectral stochastic method where orthogonal polynomials are used to represent the statistical processes. The coefficients of the partial inductance orthogonal polynomial are computed via the collocation method where a fast multi-dimensional Gaussian quadrature method is applied with sparse grids. To further improve the efficiency of the proposed method, a random variable reduction scheme is used. Given the interconnect wire variation parameters, the resulting method can derive the parameterized closed form of the inductance value. We show that both partial and loop inductance variations can be significant given the width and height variations. This new approach can work with any existing inductance extraction tool to extract the variational partial and loop inductance or impedance. Experimental results show that our method is orders of magnitude faster than the Monte Carlo method for several practical interconnect structures.  相似文献   

16.
Stretchable interconnects are fabricated on polymer substrates using metal patterns both as functional interconnect layers and as in situ masks for excimer laser photoablation. Single-layer and multilayer interconnects of various designs (rectilinear and “meandering”) have been fabricated, and certain “meandering” interconnect designs can be stretched up to 50% uniaxially while maintaining good electrical conductivity and structural integrity. This approach eliminates masks and microfabrication processing steps as compared to traditional fabrication approaches. Furthermore, this technology is scalable for large-area sensor arrays and electronic circuits, adaptable for a variety of materials and interconnects designs, and compatible with MEMS-based capacitive sensor technology.   相似文献   

17.
Crosstalk limits the achievable data rate of global on-chip interconnects on large CMOS ICs. This is especially the case, if low-swing signaling is used to reduce power consumption. Differential interconnects provide a solution for most crosstalk and noise sources, but not for neighbor-to-neighbor crosstalk in a data bus. This neighbor-to-neighbor crosstalk can be reduced with twists in the differential interconnect pairs. To reduce via resistance and metal layer use, we use as few twists as possible by placing only one twist in every even interconnect pair and only two twists in every odd interconnect pair. Analysis shows that there are optimal positions for the twists, which depend on the termination impedances of the interconnects. Theory and measurements on a 10-mm-long bus in 0.13-mum CMOS show that only one twist at 50% of the even interconnect pairs, two twists at 30% and 70% of the odd interconnect pairs, and both a low-ohmic source and a low-ohmic load impedance are very effective in mitigating the crosstalk  相似文献   

18.
戢小亮  佟星元  吴睿振  杜鸣 《电子学报》2018,46(12):2964-2969
针对集成电路工艺参数波动影响芯片良率的问题,提出一种提高芯片良率的时序电路缓冲器插入算法.该算法通过蒙特卡罗仿真模拟流片后的芯片,确定时序电路中可插入缓冲器的最佳位置,在保证良率的前提下,降低了面积及成本损耗.算法经过ISCAS89的基准电路和TAU2013的电路进行仿真验证,结果表明插入缓冲器的数量小于等于触发器数量的1%,良率提高高达35.98%.  相似文献   

19.
For very deep sub-micrometer VLSI, crosstalk becomes an important issue in affecting performance and signal integrity of the circuits. Two crosstalk fault effects, namely, glitch and crosstalk-induced delay, in the system-on-chip (SOC) interconnect bus are analyzed and a unified scheme to detect them is proposed and demonstrated in this paper. The crosstalk induced delay is found to be superposition of the induced glitch and the applied signal at the victim line, and this effect is more important in affecting the circuit performance. A pulse detector with an adjustable detection threshold is proposed to detect glitches and consequently the induced delay. Several issues affecting the yield of the proposed testing scheme are discussed and Monte Carlo simulations are conducted to show the feasibility of the scheme.   相似文献   

20.
何宾  王瑜 《电子设计工程》2011,19(13):141-144
MicroBlaze核是嵌入在Xilinx FPGA之中的属于32位RISC Harvard架构软处理器核。针对Xilinx MicroBlaze软处理器的核间互连,实现多处理器核之间的快速通信的目的,采用了PLB和FSL总线混连的方法,利用xps_mail-box和xps_mutex核完成核间的通信与同步,通过在Xilinx EDK平台下,将3个软处理器核嵌入到FPGA Spartan-3E芯片上的试验,开发出了一个运行在FPGA上的基于多处理器的嵌入式可编程片上系统,得出此种多核处理器混连的可行性与实用性,核间通信速度得到提升的结论。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号