首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A Globally Asynchronous, Locally Synchronous (GALS) system with dynamic voltage and frequency scaling can use the slowest frequency possible to accomplish a task with minimal power consumption. With the mechanism for implementing dynamic voltage scaling at each synchronous domain left up to the designer, our Globally Asynchronous, Locally Dynamic System (GALDS) provides a top-down, system-level means to maximize power reduction in an integrated circuit and facilitate system-on-a-chip (SoC) design. Our solution includes three distinct components: a novel bidirectional asynchronous FIFO to communicate between independently clocked synchronous blocks , an all-digital dynamic clock generator to quickly and glitchlessly switch between frequencies and a digitally controlled oscillator to generate the global fixed frequency clocks required by the all-digital dynamic clock generator. In addition to being capable of reducing power consumption when combined with dynamic voltage scaling, a GALDS design benefits from numerous other advantages such as simplified clock distribution, high performance operation and faster time-to-market through the modular nature of the architecture.  相似文献   

2.
This paper describes the classification, detailed timing characterization, evaluation, and design of the dual-edge triggered storage elements (DETSE). The performance and power characterization of DETSE includes the effect of clocking at halved clock frequency and impact of load imposed by the storage element to the clock distribution network. The presented analysis estimates the timing penalty and power savings of a system based on DETSE, and gives design guidelines for high-performance and low-power application. In addition, the paper presents a class of dual-edge triggered flip-flops with clock load, delay, and internal power consumption comparable to the fastest single-edge triggered storage elements (SETSE). Our simulated results show that by halving the clock frequency, dual-edge clocking strategy can save about 50% of the power consumed by the clock distribution network, and relax the design of clock distribution system, while paying virtually no penalty in throughput.  相似文献   

3.
Ultra Low-Power Clocking Scheme Using Energy Recovery and Clock Gating   总被引:1,自引:0,他引:1  
A significant fraction of the total power in highly synchronous systems is dissipated over clock networks. Hence, low-power clocking schemes are promising approaches for low-power design. We propose four novel energy recovery clocked flip-flops that enable energy recovery from the clock network, resulting in significant energy savings. The proposed flip-flops operate with a single-phase sinusoidal clock, which can be generated with high efficiency. In the TSMC 0.25-$mu$m CMOS technology, we implemented 1024 proposed energy recovery clocked flip-flops through an H-tree clock network driven by a resonant clock-generator to generate a sinusoidal clock. Simulation results show a power reduction of 90% on the clock-tree and total power savings of up to 83% as compared to the same implementation using the conventional square-wave clocking scheme and flip-flops. Using a sinusoidal clock signal for energy recovery prevents application of existing clock gating solutions. In this paper, we also propose clock gating solutions for energy recovery clocking. Applying our clock gating to the energy recovery clocked flip-flops reduces their power by more than 1000 $times$ in the idle mode with negligible power and delay overhead in the active mode. Finally, a test chip containing two pipelined multipliers one designed with conventional square wave clocked flip-flops and the other one with the proposed energy recovery clocked flip-flops is fabricated and measured. Based on measurement results, the energy recovery clocking scheme and flip-flops show a power reduction of 71% on the clock-tree and 39% on flip-flops, resulting in an overall power savings of 25% for the multiplier chip.   相似文献   

4.
We propose a half-swing clocking scheme that allows us to reduce power consumption of clocking circuitry by as much as 75%, because all the clock signal swings are reduced to half of the LSI supply voltage. The new clocking scheme causes quite small speed degradation, because the random logic circuits in the critical path are still supplied by the full supply voltage. We also propose a clock driver which supplies half-swing clock and generates half VDD by itself. We confirmed that the half-swing clocking scheme provided 67% power saving in a test chip fabricated with 0.5 μm CMOS technology, ideally 75%, in the clocking circuitry, and that the degradation in speed was only 0.5 ns by circuit simulation. The key to the proposed clocking scheme is the concept that the voltage swing is reduced only for clocking circuitry, but is retained for all other circuits in the chip. This results in significant power reduction with minimal speed degradation  相似文献   

5.
We propose injection-locked clocking (ILC) to combat deteriorating clock skew and jitter, and reduce power consumption in high-performance microprocessors. In the new clocking scheme, injection-locked oscillators are used as local clock receivers. Compared to conventional clocking with buffered trees or grids, ILC can achieve better power efficiency, lower jitter, and much simpler skew compensation thanks to its built-in deskewing capability. Unlike other alternatives, ILC is fully compatible with conventional clock distribution networks. In this paper, a quantitative study based on circuit and microarchitectural-level simulations is performed. Alpha21264 is used as the baseline processor, and is scaled to 0.13 $mu$ m and 3 GHz. Simulations show 20- and 23-ps jitter reduction, 10.1% and 17% power savings in two ILC configurations. A test chip distributing 5-GHz clock is implemented in a standard 0.18- $mu$m CMOS technology and achieved excellent jitter performance and a deskew range up to 80 ps.   相似文献   

6.
This paper describes a novel communication scheme, which is guaranteed to be free of synchronization failures, amongst multiple synchronous and asynchronous modules operating independently. In this scheme, communication between every pair of modules is done through an asynchronous first-in first-out (FIFO) channel; communication between a module and the FIFO is done using a request/acknowledge handshaking. Synchronization of handshake signals to the local module clock is done in an unconventional way-the local clock built out of a ring oscillator is paused or stretched, if necessary, to ensure that the handshake signal satisfies setup and hold time constraints with respect to the local clock. In order to validate this scheme, we implemented a test chip in 0.5-μm CMOS. This chip is designed as a ring, composed of two synchronous modules, an asynchronous module, and two asynchronous FIFOs. Each module functions as a receiver to one module and a sender to another module. Test results show that the chip functions reliably up to 456 MHz  相似文献   

7.
This paper presents a resource allocation technique to design low-power register-transfer-level datapaths. The basis of this technique is to use a multiple clocking scheme of n nonoverlapping clocks, by dividing the frequency f of a single clock into n cycles, to partition the circuit into n disjoint modules and assign each module to a distinct clock, and to operate each module only during its corresponding duty cycle, thus clocking each module by a frequency f/n to reduce power. However, the overall effective frequency of the circuit remains f, i.e., the single clock frequency. Further power reduction is also obtained by tradeoffs between voltage, power, and delay across multiple clock partitions. Power savings up to 50% of the proposed multiple clocking scheme in comparison to single gated clock designs are also reported  相似文献   

8.
Globally asynchronous, locally synchronous (GALS) systems-on-chip (SoCs) may be prone to synchronization failures if the delay of their locally-generated clock tree is not considered. This paper presents an in-depth analysis of the problem and proposes a novel solution. The problem is analyzed considering the magnitude of clock tree delays, the cycle times of the GALS module, and the complexity of the asynchronous interface controllers using a timed signal transition graph (STG) approach. In some cases, the problem can be solved by extracting all the delays and verifying whether the system is susceptible to metastability. In other cases, when high data bandwidth is not required, matched-delay asynchronous ports may be employed. A novel architecture for synchronizing inter-modular communications in GALS, based on locally delayed latching (LDL), is described. LDL synchronization does not require pausable clocking, is insensitive to clock tree delays, and supports high data rates. It replaces complex global timing constraints with simpler localized ones. Three different LDL ports are presented. The risk of metastability in the synchronizer is analyzed in a technology-independent manner  相似文献   

9.
Asynchronous Techniques for System-on-Chip Design   总被引:3,自引:0,他引:3  
SoC design will require asynchronous techniques as the large parameter variations across the chip will make it impossible to control delays in clock networks and other global signals efficiently. Initially, SoCs will be globally asynchronous and locally synchronous (GALS). But the complexity of the numerous asynchronous/synchronous interfaces required in a GALS will eventually lead to entirely asynchronous solutions. This paper introduces the main design principles, methods, and building blocks for asynchronous VLSI systems, with an emphasis on communication and synchronization. Asynchronous circuits with the only delay assumption of isochronic forks are called quasi-delay-insensitive (QDI). QDI is used in the paper as the basis for asynchronous logic. The paper discusses asynchronous handshake protocols for communication and the notion of validity/neutrality tests, and completion tree. Basic building blocks for sequencing, storage, function evaluation, and buses are described, and two alternative methods for the implementation of an arbitrary computation are explained. Issues of arbitration, and synchronization play an important role in complex distributed systems and especially in GALS. The two main asynchronous/synchronous interfaces needed in GALS-one based on synchronizer, the other on stoppable clock-are described and analyzed.  相似文献   

10.
This paper presents a custom chip for linearization of RF power amplifiers using digital predistortion. The chip has been implemented in a standard digital 0.8 m CMOS process with standard static cells and single-phase clocking. A systolic complex multiplier based on distributed arithmetic constitutes the core of the chip. The nonlinear function is realized with a look-up table containing complex gain factors applied to the complex multiplier. Maximum clock frequency was found by means of simulation to be 105 MHz corresponding to 21 Msamples/s throughput with 3 W power consumption using 5 V supply voltage. The fabricated chip is fully functional and has been measured up to 60 MHz clock frequency with 825 mW power consumption with 3.3 V supply voltage. Operation at 1.5 V supply voltage allows 10 MHz clock frequency with 35 mW power consumption.  相似文献   

11.
An energy recovery or resonant clocking scheme is very attractive for saving the clock power in nanoscale ASICs and systems-on-chips, which have increased functionality and die sizes. The technology scaling followed Moore’s law, that lowers node capacitance and supply voltage, making nanoscale integrated circuits more vulnerable to radiation-induced single event upsets (SEUs) or soft errors. In this work, we propose soft-error robust flip-flops (FFs) capable of working with a sinusoidal resonant clock to save the overall chip power. The proposed conditional-pass Quatro (CPQ) FF and true single phase clock energy recovery (TSPCER) FF are based on a unique soft error robust latch, which we refer to as a Quatro latch. The proposed C2-DICE FF is based on a dual interlocked cell (DICE) latch. In addition to the storage cell, each FF consists of a unique input-stage and a two-transistor, two-input output buffer. In each FF with a sinusoidal clock, the transfer unit passes the data to the Quatro and DICE latches. The latches store the data values at two storage nodes and two redundant nodes, the latter enabling recovery from a particle-induced transient with or without multiple-node charge sharing. Post-layout simulations in 65nm CMOS technology show that the FF exhibits as much as 82% lower power-delay product compared to recently reported soft error robust FFs. We implemented 1024 proposed FFs distributed in an H-tree clock network driven by a resonant clock-generator that generates a 1–5 GHz sinusoidal clock signal. The simulation results show a power reduction of 93% on the clock tree and total power saving of up to 74% as compared to the same implementation using the conventional square-wave clocking scheme and FFs.  相似文献   

12.
In this paper the authors evaluate the timing and power performance of three skew-tolerant clocking schemes. These schemes are the well known master–slave clocking scheme (MS) and two schemes developed by the authors: Parallel alternating latches clocking scheme (PALACS) and four-phase parallel alternating latches clocking scheme (four-phase PALACS). In order to evaluate the timing performance, the authors introduce algorithms to obtain the clock waveforms required by a synchronous sequential circuit. Separated algorithms were developed for every clocking scheme. From these waveforms it is possible to get parameters such as the non-overlapping time and the clock period. They have been implemented in a tool and have been used to compare the timing performance of the clocking schemes applied to a simple circuit. To analyse the power consumption the authors have electrically simulated a simple circuit for several operation frequencies. The most remarkable conclusion is that it is possible to save about 50% of the power consumption of the clock distribution network by using PALACS.  相似文献   

13.
Recent research has demonstrated that for certain types of applications like sampled audio systems, self-timed circuits can achieve very low power consumption, because unused circuit parts automatically turn into a stand-by mode. Additional savings may be obtained by combining the self-timed circuits with a mechanism that adaptively adjusts the supply voltage to the smallest possible, while maintaining the performance requirements. This paper describes such a mechanism, analyzes the possible power savings, and presents a demonstrator chip that has been fabricated and tested. The idea of voltage scaling has been used previously in synchronous circuits, and the contributions of the present paper are: 1) the combination of supply scaling and self-timed circuitry which has some unique advantages, and 2) the thorough analysis of the power savings that are possible using this technique  相似文献   

14.
A burst-mode clock recovery circuit with a novel dual bit-rate structure is presented. It utilizes two gated oscillators to align the clock with data edges and can operate in half-rate clocking mode, doubling data throughput, as well as in full-rate clocking mode. The gated oscillator reset-phase control scheme causes the starting phase of gated oscillators to alternate repeatedly between 0deg and 180deg according to the current clock phase. A prototype chip was designed with the 0.18-mum CMOS technology, and a 1.25/2.5-Gb/s dual-mode operation was verified by measurement  相似文献   

15.
Rotary clock is a resonant clocking technique that delivers on-chip clock signal distribution with very low power dissipation. Since it can only generate clock signals with multiple phases that are spatially distributed, rotary clock is often considered not applicable to industrial very large scale integration (VLSI) designs. This paper presents the first rotary-clock-based nontrivial digital circuit. Our design, a low-power and high-speed finite-impulse response (FIR) filter, is fully digital and generated using CMOS standard cells in 0.18 mum technology. We have shown that the proposed FIR filter is seamlessly integrated with the rotary clock technique. It uses the spatially distributed multiple clock phases of rotary clock and achieves high power savings. Simulation results demonstrate that our rotary-clock-based FIR filter can operate successfully at 610 MHz, providing a throughput of 39 Gb/s. In comparison with the conventional clock-tree-based design, our design achieves a 34.6% clocking power saving and a 12.8% overall circuit power saving. In addition, the peak current consumed by the rotary-clock-based filter is substantially lower by 40% on the average. Our study makes the crucial step toward the application of rotary clock technique to a broad range of VLSI designs.  相似文献   

16.
Presents a synchronous solution for clocking VLSI systems organized as distributed systems. This solution avoids the drawbacks of the self-timed approach. These VLSI systems are constituted of modules which represent synchronous areas driven by their own fast clock, interconnected by a synchronous communication mechanism driven by a slow clock. In order to avoid the risk of metastability in flip-flop between the modules and the communication mechanism, the author suggests to resynchronize the phase of each module clock on the transitions of the communication clock by a phase locked loop circuitry added to each module.  相似文献   

17.
A critical problem in building long systolic arrays lies in efficient and reliable synchronization. We address this problem in the context of synchronous systems by introducing probabilistic models for two alternative clock distribution schemes: tree and straight-line clocking. We present analytic bounds for the Probability of Failure and the Mean Time to Failure, and examine the trade-offs between reliability and throughput in both schemes. Our basic conclusion is that as the one-dimensional systolic array gets very long, tree clocking becomes more reliable than straight-line clocking.  相似文献   

18.
A voltage scaling technique for energy-efficient operation requires an adaptive power-supply regulator to significantly reduce dynamic power consumption in synchronous digital circuits. A digitally controlled power converter that dynamically tracks circuit performance with a ring oscillator and regulates the supply voltage to the minimum required to operate at a desired frequency is presented. This paper investigates the issues involved in designing a fully digital power converter and describes a design fabricated in a MOSIS 0.8-μm process. A variable-frequency digital controller design takes advantage of the power savings available through adaptive supply-voltage scaling and demonstrates converter efficiency greater than 90% over a dynamic range of regulated voltage levels  相似文献   

19.
Dual-supply voltage design using a clustered voltage scaling (CVS) scheme is an effective approach to reduce chip power. The optimal CVS design relies on a level converter implemented in a flip-flop to minimize energy, delay, and area penalties due to level conversion. Additionally, circuit robustness against supply bounce is a key property that differentiates good level converter design. Novel flip-flops presented in this paper incorporate a half-latch level converter and a precharged level converter. These flip-flops are optimized in the energy-delay design space to achieve over 30% reduction of energy-delay product and about 10% savings of total power in a CVS design as compared to the conventional flip-flop. These benefits are accompanied by 24% flip-flop robustness improvement leading to 13% delay spread reduction in a CVS critical path. The proposed flip-flops also show 18% layout area reduction. Advantages of level conversion in a flip-flop over asynchronous level conversion in combinational logic are also discussed in terms of delay penalty and its sensitivity to supply bounce.  相似文献   

20.
Power dissipation is becoming a prime design constraint in VLSI systems. The new key words for evaluating a design's performance are low power and high speed. This requires an overall system design review that considers suitable algorithms, architectures, circuits, and technology. In synchronous systems, the clocking network sets the frame that contains the whole design. It must be simple and robust. Power consumption in the clock distribution network has usually been a substantial part of the system total power consumption. New true single phase latches and flip flops are presented that are slope-insensitive, fast, and have data dependent power consumption. Flip flops are presented that work between DC and 1.7 GHz clock frequencies in a 1 μm CMOS technology. Methods are given that result in power saving in the clock system by reducing the clock rate by half for the same data throughput on the system level  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号