首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
赵冰  仇玉林  吕铁良  黑勇   《电子器件》2006,29(3):613-616
针对一种异步实现结构的异步快速傅立叶变换处理器,给出了处理器中异步加法器的电路和异步乘法器的结构.该异步快速傅立叶变换处理器采用本地的握手信号代替了传统的整体时钟.通过对一个8点的异步快速傅立叶变换处理器电路仿真,得到该处理器的平均响应时间为31.15ns,仅为最差响应时间42.85ns的72.7%.由此可见,异步快速傅立叶变换处理器在性能方面较同步处理器存在优势。  相似文献   

2.
Globally Asynchronous Locally Synchronous Network on Chip (GALS NoC) is one of the possible interconnect platforms in multiprocessor systems on a chip. Designing proper links and buffers in these circuits can improve their performance. An asynchronous pipeline is a key element in buffer designs. The type of pipeline and its size can influence the performance metrics such as power consumption and delay. However, asynchronous pipelines face some challenges such as performance evaluation, verification, and process variation. We consider a new formal model to overcome these challenges simultaneously. In this paper, a new statistical model for asynchronous pipelines based on Generalized Stochastic Petri Net (GSPN) has been developed. This model can be applied to different pipeline stages, in order to compare them based on the statistical analysis of performance metrics (power consumption and delay), and to analyze their performance and timing verification in presence of variation. We have explored various kinds of asynchronous pipelines, and their corresponding results show this model has reasonable accuracy in average (below 5%) and in variance, compared to the low level Monte Carlo Hspice simulation.  相似文献   

3.
赵冰  仇玉林  吕铁良  黑勇 《微电子学》2006,36(4):396-399
介绍一种采用异步实现结构的快速傅里叶变换处理器,该处理器的控制采用本地握手信号取代传统的系统时钟。给出了处理器中异步加法器的电路结构,设计了一个采用Booth译码Wallace tree结构的异步乘法器。通过对一个8点的异步快速傅里叶变换处理器进行电路仿真,得到该处理器完成一次变换的平均响应时间为31.15 ns,仅为最差响应时间42.85 ns的72.7%。可见,采用异步方式的快速傅里叶变换处理器在性能方面较同步处理器存在优势。  相似文献   

4.
介绍了一种适用于Viterbi解码器的异步ACS(加法器-比较器-选择器)的设计.它采用异步握手信号取代了同步电路中的整体时钟.给出了一种异步实现结构的异步加法单元、异步比较单元和异步选择单元电路.采用全定制设计方法设计了一个异步4-bit ACS,并通过0.6μm CMOS工艺进行投片验证.经过测试,芯片在工作电压5V,工作频率20MHz时的功耗为75.5mW.由于采用异步控制,芯片在"睡眠"状态待机时不消耗动态功耗.芯片的平均响应时间为19.18ns,仅为最差响应时间23.37ns的82%.通过与相同工艺下的同步4-bit ACS在功耗和性能方面仿真结果的比较,可见异步ACS较同步ACS具有优势.  相似文献   

5.
This paper describes a new technique for integrating asynchronous modules within a high-speed synchronous pipeline. Our design eliminates potential metastability problems by using a clock generated by a stoppable ring oscillator, which is capable of driving the large clock load found in present day microprocessors. Using the ATACS design tool, we designed highly optimized transistor-level circuits to control the ring oscillator and generate the clock and handshake signals with minimal overhead. Our interface architecture requires no redesign of the synchronous circuitry. Incorporating asynchronous modules in a high-speed pipeline improves performance by exploiting data-dependent delay variations. Since the speed of the synchronous circuitry tracks the speed of the ring oscillator under different processes, temperatures, and voltages, the entire chip operates at the speed dictated by the current operating conditions, rather than being governed by the worst case conditions. These two factors together can lead to a significant improvement in average-case performance. The interface design is simulated using the 0.6-μm HP CMOS14B process in HSPICE  相似文献   

6.
Glitches due to the secondary neutron particles from cosmic rays cause soft errors in integrated circuits (IC) that are becoming a major threat in modern sub 45nm ICs. Therefore, researchers have developed many techniques to mitigate the soft errors and some of them utilize the built in error detection schemes of low-power asynchronous null conventional logic (NCL). However, it requires extensive simulations and emulations for careful and complete analysis of the design, which can be costly, time consuming and cannot encompass all the possible input conditions. In this paper, we propose a framework to improve the soft error tolerant asynchronous pipelines by identifying and formally analyzing the vulnerable paths using the nuXmv model checker. The proposed framework translates the design behavior and specification into a state-space model and the potential vulnerabilities against soft errors in the pipeline as linear temporal logical (LTL) properties. These formally specified properties are then verified on the state-space model and in case of failure counterexamples are obtained. These counterexamples can then be further analyzed to obtain the soft error propagation paths and thus give insights about soft error tolerant approaches to the designers. For illustration, this work provides an analysis and comparison of three state-of-the-art asynchronous pipelines. Formal model and analysis of all the pipelines show that the soft error hardened pipeline is comparatively superior against soft errors but at the expense of almost two times area overhead.  相似文献   

7.
This paper introduces a high-throughput asynchronous pipeline style, called high-capacity (HC) pipelines, targeted to datapaths that use dynamic logic. This approach includes a novel highly-concurrent handshake protocol, with fewer synchronization points between neighboring pipeline stages than almost all existing asynchronous dynamic pipelining approaches. Furthermore, the dynamic pipelines provide 100% buffering capacity, without explicit latches, by means of separate pullup and pulldown control for each pipeline stage: neighboring stages can store distinct data items, unlike almost all existing latchless dynamic asynchronous pipelines. As a result, very high throughput is obtained. Fabricated first-input-first-output (FIFO) designs, in 0.18-m technology, were fully functional over a wide range of supply voltages (1.2 to over 2.5 V), exhibiting a corresponding range of throughputs from 1.0-2.4 giga items/s. In addition, an experimental finite-impulse response (FIR) filter chip was designed and fabricated with IBM Research, whose speed-critical core used an HC pipeline. The HC pipeline exhibited throughputs up to 1.8 giga items/s, and the overall filter achieved 1.32 giga items/s, thus obtaining 15% higher throughput and 50% lower latency than the fastest previously-reported synchronous FIR filter, also designed at IBM Research.  相似文献   

8.
We describe the design and implementation of an asynchronous discrete cosine transform/inverse discrete cosine transform (DCT/IDCT) processor core compliant with the CCITT recommendation H.261. First, a micropipelined implementation with level-sensitive latches is shown. This is improved by replacing the level-sensitive latches with dual-edge triggered flip-flops to save power and using completion-detection adders in the critical stage of the pipeline to exploit the data-dependent processing delay. Gate-level simulation of extracted layouts indicates that the performance of asynchronous implementations is comparable with that of a synchronous implementation based on an identical architecture. This is because part of the penalty introduced by handshaking circuitry in an asynchronous pipeline can be recovered by exploiting data-dependent processing delays with completion-detection circuitry. In pipelines with significant arithmetic processing such as the DCT/IDCT processor, this is easily accomplished. Our results are encouraging because asynchronous designs do not employ global clocking. In the near future when clock generation, clock distribution, and the power consumed in the clock circuitry become limiting factors in the design of large synchronous application specific integrated circuits (ASICs), asynchronous implementation methodology could be pursued as a real alternative  相似文献   

9.
介绍一种新型异步 ACS(加法器 -比较器 -选择器 )的设计。一种异步实现结构的异步比较器 ,并通过异步加法单元、比较单元和选择单元的异步互连 ,构成了异步 ACS。在异步 ACS的性能分析时采用了一种基于多延迟模型的新方法 ,建立了异步加法器和比较器的多延迟模型 ,通过逻辑仿真 ,得到异步 ACS的平均响应时间为 3 .66ns,最长响应时间为 8.1 ns。由此可见 ,异步 ACS在性能方面较同步 ACS存在优势。  相似文献   

10.
Process-induced and environmental fluctuations play an important role in the design process for modern high-performance integrated circuits. The conventional principle of considering the verification of worst-case requirements reduces the performance that can potentially be achieved by circuits and technology. This paper presents a new mechanism that permits the compensation of random independent delay fluctuations due to environmental factors. It shows that it is significantly possible to reduce the latency time of a circuit even for a moderate length of pipeline stages.  相似文献   

11.
Two FIFO ring performance experiments   总被引:1,自引:0,他引:1  
Asynchronous circuits are often perceived to operate slower than equivalent clocked circuits. We demonstrate with fabricated chips that asynchronous circuits can be every bit as fast as clocked circuits. We describe two high-speed first-in-first-out (FIFO) circuits that we used to compare the performance of asynchronous FIFOs with that of conventionally clocked shift registers. The first FIFO circuit uses a pulse-like protocol, which we call the Asynchronous Symmetric Persistent Pulse Protocol (asP*), to advance data along a pipeline of conventional latches. Use of this protocol requires careful management of circuit delays. The second FIFO circuit uses a transition signaling protocol and special transition latches to store data. These transition latches are fast, but they are about 50% larger than conventional latches. Measurements obtained from chips fabricated in 0.6 μm CMOS and from SPICE simulations show that the throughput of the first FIFO design matches that of a conventionally clocked shift register design, with a maximum throughput of 1.1 Giga data items per second. The throughput of the second design exceeds the performance of the asP* design and achieves a maximum throughput of 1.7 Giga data items per second. We have extensively tested the chips and have found them to operate reliably over a very wide range of conditions  相似文献   

12.
This paper demonstrates the design of efficient asynchronous bundled-data pipelines for the matrix-vector multiplication core of discrete cosine transforms (DCTs). The architecture is optimized for both zero and small-valued data, typical in DCT applications, yielding both high average performance and low average power. The proposed bundled-data pipelines include novel data-dependent delay lines with integrated control circuitry to efficiently implement speculative completion sensing. The control circuits are based on a novel control-circuit template that simplifies the design of such nonlinear pipelines. Extensive post-layout back-end timing analysis was performed to gain confidence in the timing margins as well as to quantify performance and energy. Comparison with a synchronous counterpart suggests that our best asynchronous design yields 30% higher average throughput with negligible energy overhead.  相似文献   

13.
Estimation of asynchronous circuit performances, such as speed, is one of the major reasons that still keeps that design style undeservedly unpopular. A simple logic simulator would be very useful in overcoming this problem if it is used for evaluating the worst-case delays in the paths of an asynchronous circuit using. In this paper a method for timing analysis of the asynchronous circuits using a VHDL simulator is presented. It is capable to deal with both non-sequential and sequential asynchronous circuit. An appropriate extension of the standard logic simulation process enables all worst-case delays for all paths in a digital circuit to be obtained with only one run of the simulation. High levels of accuracy are achieved using extensive gate modelling while statistical analysis of the results was also used to evaluate part of the parametric yield loss related to the delay. Due to the lack of asynchronous benchmark circuits, the method is verified on a set of asynchronous circuit selected by the authors.  相似文献   

14.
An asynchronous pipeline style is introduced for high-speed applications, called MOUSETRAP. The pipeline uses standard transparent latches and static logic in its datapath, and small latch controllers consisting of only a single gate per pipeline stage. This simple structure is combined with an efficient and highly-concurrent event-driven protocol between adjacent stages. Post-layout SPICE simulations of a ten-stage pipeline with a 4-bit wide datapath indicate throughputs of 2.1-2.4 GHz in a 0.18-mum TSMC CMOS process. Similar results were obtained when the datapath width was extended to 16 bits. This performance is competitive even with that of wave pipelines, without the accompanying problems of complex timing and much design effort. Additionally, the new pipeline gracefully and robustly adapts to variable speed environments. The pipeline stages are extended to fork and join structures, to handle more complex system architectures.  相似文献   

15.
设计了一种8位1.2V,1GS/s双通道流水线A/D转换器(ADC)。所设计ADC对1.5位增益D/A转换电路(MDAC)中的流水线双通道结构进行改进,其中设置有双通道流水线时分复用运算放大器和双/单通道快闪式ADC,以简化结构并提高速度;在系统前置采样/保持器中加设由单一时间信号驱动的开关线性化控制(SLC)电路,以解决两条通道之间的采样歪扭和时序失调问题。用90nm标准CMOS工艺对所设计的流水线ADC进行仿真试验,结果表明,室温下所设计ADC的信噪比SNR为32.7dB,无杂散动态范围SFDR为42.3dB,它的分辨率、功耗PD和采样速率SR分别为8位、23mW和1GS/s,从而满足了高速、高精度和低功耗的应用需要。  相似文献   

16.
A new class of asynchronous pipelines is proposed, called lookahead pipelines (LP), which use dynamic logic and are capable of delivering multi-gigahertz throughputs. Since they are asynchronous, these pipelines avoid problems related to high-speed clock distribution, such as clock power, management of clock skew, and inflexibility in handling varied environments. The designs are based on the well-known PSO style of Williams and Horowitz as a starting point, but achieve significant improvements through novel protocol optimizations: the pipeline communication is structured so that critical events can be detected and exploited earlier. A special focus of this work is to target extremely fine-grain or gate-level pipelines, where the datapath is sectioned into stages, each consisting of logic that is only a single level deep. Both dual-rail and single-rail pipeline implementations are proposed. All the implementations are characterized by low-cost control structures and the avoidance of explicit latches. Post-layout SPICE simulations, in 0.18-mum technology, indicate that the best dual-rail LP design has more than twice the throughput (1.04 giga data items/s) of Williams' PSO design, while the best single-rail LP design achieves even higher throughput (1.55 giga data items/s).  相似文献   

17.
While adders are usually designed for the worst-case where their carry propagates through the entire bits, those cases rarely happen at real operation. This work takes advantage of the infrequent worst-case occurrences by designing adders for the average-case. Such design implies that computation errors may happen. Those are being corrected by implementing multi-mode addition with the aid of a dedicated control circuit. A power-delay-energy model is presented, enabling to find the optimal design point. We show that for cases where the system׳s critical paths are dictated by the adders, the system׳s operation voltage can be scaled, without harming the clock cycle and with very small performance degradation. For an adder per-se, potential energy savings of up to 50% is shown. The multi-mode adder has been integrated in a 32-bit pipelined MIPS processor, validating the correctness of such design approach.  相似文献   

18.
We study the packet transmission scheduling problem with tuning delay in wavelength-division multiplexed (WDM) optical communication networks with tunable transmitters and fixed-tuned receivers. By treating the numbers of packets as random variables, we conduct probabilistic analysis of the average-case performance ratio for the cyclic packet transmission scheduling algorithm. Our numerical data as well as simulation results demonstrate that the average-case performance ratio of cyclic schedules is very close to one for reasonable system configurations and probability distributions of the numbers of packets. In particular, when the number of receivers that share a channel and/or the granularity of packet transmission are large, the average-case performance ratio approaches one. Better performance can be achieved by overlapping tuning delays with packet transmission. We derive a bound for the normalized tuning delay Δ such that tuning delay can be completely masked with high probability. Our study implies that by using currently available tunable optical transceivers, it is possible to build single-hop WDM networks that efficiently utilize all the wavelengths.  相似文献   

19.
The performance of many modern computer and communication systems is dictated by the latency of communication pipelines. At the same time, power/energy consumption is often another limiting factor in many portable systems. We address the problem of how to minimize the power consumption in system-level pipelines under latency constraints. In particular, we apply fragmentation technique to achieve parallelism and exploit advantages provided by variable voltage design methodology to optimally select voltage and, therefore, speed of each pipeline stage. We focus our study on the practical case when each pipeline stage operates at a fixed speed. Unlike the conventional pipeline system, where all stages run at the same speed, our system may have different stages running at different speeds to conserve energy while providing guaranteed latency. For a given latency requirement, we find explicit solutions for the most energy efficient fragmentation and and voltage setting. We further study a less practical case when each stage can dynamically change its speed to get further energy saving. We define the problem and transform it to a nonlinear system whose solution provides a lower bound for energy consumption.  相似文献   

20.
异步集成电路标准单元的设计与实现   总被引:1,自引:1,他引:0  
赵冰  仇玉林  黑勇   《电子器件》2005,28(2):346-348,351
设计异步集成电路时,常用的异步标准单元的分类、电路设计方法和电路结构.详细介绍了C单元和异步数据通路的设计与实现,提出了一种异步实现结构的异步加法单元、异步比较单元和异步选择单元电路.利用设计的异步标准单元构成了一个适用于Viterbi解码器的异步ACS(加法器一比较器一选择器),并通过0.6μmCMOS工艺进行投片验证.当芯片工作电压为5V,工作频率为20MHz时的功耗为75.5mW.芯片的平均响应时问为19.18DS,仅为最差响应时间23.37ns的82%.从而验证了异步标准单元的正确性和异步电路在性能方面较同步电路存在的优势.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号