首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
As system-level interconnect incurs increasing penalties in latency, round-trip cycle time and power, and as timing-variability becomes an increasing design challenge, there is renewed interest in using two-phase delay-insensitive asynchronous protocols for robust system-level communication. However, in practice, it is extremely inefficient to build local asynchronous computation nodes with two-phase logic, hence four-phase (i.e., return-to-zero) computation blocks are typically used. This paper proposes two new architecture for a family of asynchronous protocol converters that translate between two- and four-phase protocols, thus facilitating robust system design using efficient global two-phase communication and local four-phase computation. A converter circuit is implemented and evaluated a 0.18 micron TSMC process through post-layout simulation, assuming both a small computation block (8 $,times,$8 combinational multiplier) and an empty computation block (FIFO stage).   相似文献   

2.
本文以异步流水乘法器的设计为例,介绍了利用FPGA进行异步电路设计的思路及方法。本设计采用两段握手协议实现异步流水乘法器,将其分解为三个核心模块:信号分支模块、异步移位模块和异步加法器模块。本文具体说明了利用硬件描述语言实现异步乘法器的方法和步骤,通过Modelsim软件进行功能仿真,并下载到Genesys板卡上进行系统测试。该教学方案有助于学生理解并掌握异步电路设计方法。  相似文献   

3.
李鸿  叶燕 《数字通信》2012,39(1):83-85
介绍了基于四相单轨握手协议的异步FIR滤波器接口电路的设计。分析了异步接口电路互联模型及原理,提出了一种基于存储器和状态机实现异步握手接口电路的全新方案,实现了FIR滤波器功能和异步接口控制功能。对设计进行了波形仿真及结果分析,验证了设计方案的可行性。  相似文献   

4.
赵冰  仇玉林  吕铁良  黑勇   《电子器件》2006,29(3):613-616
针对一种异步实现结构的异步快速傅立叶变换处理器,给出了处理器中异步加法器的电路和异步乘法器的结构.该异步快速傅立叶变换处理器采用本地的握手信号代替了传统的整体时钟.通过对一个8点的异步快速傅立叶变换处理器电路仿真,得到该处理器的平均响应时间为31.15ns,仅为最差响应时间42.85ns的72.7%.由此可见,异步快速傅立叶变换处理器在性能方面较同步处理器存在优势。  相似文献   

5.
赵冰  仇玉林  吕铁良  黑勇 《微电子学》2006,36(4):396-399
介绍一种采用异步实现结构的快速傅里叶变换处理器,该处理器的控制采用本地握手信号取代传统的系统时钟。给出了处理器中异步加法器的电路结构,设计了一个采用Booth译码Wallace tree结构的异步乘法器。通过对一个8点的异步快速傅里叶变换处理器进行电路仿真,得到该处理器完成一次变换的平均响应时间为31.15 ns,仅为最差响应时间42.85 ns的72.7%。可见,采用异步方式的快速傅里叶变换处理器在性能方面较同步处理器存在优势。  相似文献   

6.
This paper deals with a new approach to the design of high-performance asynchronous pipelined datapaths. A novel methodology to implement the self-timed stages of a data-path is demonstrated. It is based on the use of both static and dynamic CMOS modules. The former act as overlapped execution circuits and anticipate their computation with respect to the dynamic blocks. An appropriate four-phase protocol able to orchestrate the proposed architecture and a new efficient handshake circuit are described. The above method, applied to a 32-bit addition stage, allows a performance gain to be obtained of up to about 40% and a reduction in power dissipation of about 33%, with a reasonable area overhead compared with conventional designs.  相似文献   

7.
余洪敏  陈陵都  刘忠立 《半导体学报》2008,29(11):2218-2225
提出了一种新的嵌入在FPGA中可重构的流水线乘法器设计.该设计采用了改进的波茨编码算法,可以实现18×18有符号乘法或17×17无符号乘法.还提出了一种新的电路优化方法来减少部分积的数目,并且提出了一种新的乘法器版图布局,以便适应tilebased FPGA芯片设计所加的约束.该乘法器可以配置成同步或异步模式,也町以配置成带流水线的模式以满足高频操作.该设计很容易扩展成不同的输入和输出位宽.同时提出了一种新的超前进位加法器电路来产生最后的结果.采用了传输门逻辑来实现整个乘法器.乘法器采用了中芯国际0.13μm CMOS工艺来实现,完成18×18的乘法操作需要4.1ns.全部使用2级的流水线时,时钟周期可以达到2.5ns.这比商用乘法器快29.1%,比其他乘法器快17.5%.与传统的基于查找表的乘法器相比,该乘法器的面积为传统乘法器面积的1/32.  相似文献   

8.
A new class of asynchronous pipelines is proposed, called lookahead pipelines (LP), which use dynamic logic and are capable of delivering multi-gigahertz throughputs. Since they are asynchronous, these pipelines avoid problems related to high-speed clock distribution, such as clock power, management of clock skew, and inflexibility in handling varied environments. The designs are based on the well-known PSO style of Williams and Horowitz as a starting point, but achieve significant improvements through novel protocol optimizations: the pipeline communication is structured so that critical events can be detected and exploited earlier. A special focus of this work is to target extremely fine-grain or gate-level pipelines, where the datapath is sectioned into stages, each consisting of logic that is only a single level deep. Both dual-rail and single-rail pipeline implementations are proposed. All the implementations are characterized by low-cost control structures and the avoidance of explicit latches. Post-layout SPICE simulations, in 0.18-mum technology, indicate that the best dual-rail LP design has more than twice the throughput (1.04 giga data items/s) of Williams' PSO design, while the best single-rail LP design achieves even higher throughput (1.55 giga data items/s).  相似文献   

9.
This paper presents a novel variable-latency multiplier architecture, suitable for implementation as a self-timed multiplier core or as a fully synchronous multicycle multiplier core. The architecture combines a second-order Booth algorithm with a split carry save array pipelined organization, incorporating multiple row skipping and completion-predicting carry-select dual adder. The paper reports the architecture and logic design, CMOS circuit design and performance evaluation. In 0.35 μm CMOS, the expected sustainable cycle time for a 32-bit synchronous implementation is 2.25 ns. Instruction level simulations estimate 54% single-cycle and 46% two-cycle operations in SPEC95 execution. Using the same CMOS process, the 32-bit asynchronous implementation is expected to reach an average 1.76 ns throughput and 3.48 ns latency in SPEC95 execution  相似文献   

10.
Asynchronous Techniques for System-on-Chip Design   总被引:3,自引:0,他引:3  
SoC design will require asynchronous techniques as the large parameter variations across the chip will make it impossible to control delays in clock networks and other global signals efficiently. Initially, SoCs will be globally asynchronous and locally synchronous (GALS). But the complexity of the numerous asynchronous/synchronous interfaces required in a GALS will eventually lead to entirely asynchronous solutions. This paper introduces the main design principles, methods, and building blocks for asynchronous VLSI systems, with an emphasis on communication and synchronization. Asynchronous circuits with the only delay assumption of isochronic forks are called quasi-delay-insensitive (QDI). QDI is used in the paper as the basis for asynchronous logic. The paper discusses asynchronous handshake protocols for communication and the notion of validity/neutrality tests, and completion tree. Basic building blocks for sequencing, storage, function evaluation, and buses are described, and two alternative methods for the implementation of an arbitrary computation are explained. Issues of arbitration, and synchronization play an important role in complex distributed systems and especially in GALS. The two main asynchronous/synchronous interfaces needed in GALS-one based on synchronizer, the other on stoppable clock-are described and analyzed.  相似文献   

11.
This work presents a design flow for asynchronous, self-timed dual-rail circuits which introduces a timing assumption in the return-to-spacer phase. The design flow enables power proportionality and is demonstrated through the design of a 32-bit ripple-carry adder and a 32-bit comparator for internet of things applications. The designs are synthesized to a 65 nm cell library with state-of-the-art transistor sizing for subthreshold. Simulation results show improved performance and energy per computation across operating conditions compared with single-rail equivalents. The design flow allows extension of the power proportional philosophy to a wider range of circuits.  相似文献   

12.
何安平  刘晓庆  陈虹 《电子学报》2018,46(4):961-968
以乘法器为代表的算术运算单元是现代数字系统的核心之一,其计算速度在很大程度上影响整个芯片的运算效率.本论文提出了一种改进的Booth乘法算法,其核心思想是先移位、再压缩,最后求和,减少了各模块间的耦合性,有利于控制电路的简化.本论文依据纯异步电路系统的设计方法,采用"约束数据捆绑"两相握手通讯协议的Click微流水线,根据控制和数据处理分离的策略,实现了这种改进算法的8位乘法器,并在FPGA上进行了验证.在45nm工艺制程的FPGA条件下,与相同体系结构的同步乘法器相比,这种异步乘法器在面积和功耗大体相同的情况下,运算速度大体提升超过12倍.  相似文献   

13.
Bartlett  V.A. Grass  E. 《Electronics letters》1997,33(22):1850-1852
A completion-detection technique is introduced for bundled-data asynchronous systems implemented in single-rail dynamic CMOS logic. Its application to a carry-ripple adder is presented showing significant benefits in terms of performance and area overhead  相似文献   

14.
This paper details a novel asynchronous pipelining methodology that maximizes the throughput buffering capacity and robustness of gate-level pipelined systems. The data paths in the proposed pipeline style are encoded using hybrid logic encoding scheme, which incorporates simplicity of the single-rail encoding and robustness of the dual-rail encoding. The control path that provides the synchronization between pipeline stages is constructed based on the simple and high-speed early acknowledgment protocol. Further, the proposed pipeline accommodates isolate phase to achieve 100% storage capacity. Two test cases: A 4-bit,10-stage FIFO and a 16-bit adder, have been designed in 90 nm technology to validate the proposed pipeline style. The FIFO has been laid out in the UMC 180 nm process using the cadence tool suite. The post-layout results of FIFO show 12.5% better throughput than the high capacity single-rail pipeline. Simulation results of the adder also reveal that the proposed structure achieves the throughput of 3.44 Giga-items/sec, which is 44.18% higher than the APCDP (Asynchronous pipeline based on constructed critical path) and 11.9% higher than the high capacity single-rail pipelines.  相似文献   

15.
文章基于GALS(Globally Asynchronous Locally Synchronous)设计理念,提出一个Core的异步接口设计模型:门控时钟停Core机制、握手机制、电平转脉冲逻辑等构成的异步控制信号处理模型:异步FIFO和双FIFO结构构成的异步数据处理模型。此结构允许Core和总线系统在完全异步的时钟域上工作。FPGA验证结果表明.该模型能正确地实现两者问的信号同步,并能满足具体应用的带宽需求。  相似文献   

16.
A synthesis flow oriented on producing the delay-insensitive dual-rail asynchronous logic is proposed. Within this flow, the existing synchronous logic synthesis tools are exploited to design technology independent single-rail synchronous Boolean network of complex (AND-OR) nodes. Next, the transformation into a dual-rail Boolean network is done. Each node is minimized under the formulated constraint to ensure hazard-free implementation. Then the technology dependent mapping procedure is applied. The MCNC and ISCAS benchmark sets are processed and the area overhead with respect to the synchronous implementation is evaluated. The implementations of the asynchronous logic obtained using the proposed (with AND-OR nodes) and the state-of-the-art (nodes are designed based on DIMS, direct logic and NCL) network structures are compared. A method, where nodes are designed as simple (NAND, NOR, etc.) gates is chosen for a detailed comparison. In our approach, the number of completion detection logic inputs is reduced significantly, since the number of nodes that should be supplied with the completion detection is less than in the case of the network structure that is based on simple gates. As a result, the improvement in sense of the total complexity and performance is obtained.  相似文献   

17.
Wave pipelining is a design technique for increasing the throughput of a digital circuit or system without introducing pipelining registers between adjacent combinational logic blocks in the circuit/system. However, this requires balancing of the delays along all the paths from the input to the output which comes the way of its implementation. Static CMOS is inherently susceptible to delay variation with input data, and hence, receives a low priority for wave pipelined digital design. On the other hand, ECL and CML, which are amenable to wave pipelining, lack the compactness and low power attributes of CMOS. In this paper we attempt to exploit wave pipelining in CMOS technology. We use a single generic building block in Normal Process Complementary Pass Transistor Logic (NPCPL), modeled after CPL, to achieve equal delay along all the propagation paths in the logic structure. An 8×8 b multiplier is designed using this logic in a 0.8 μm technology. The carry-save multiplier architecture is modified suitably to support wave pipelining, viz., the logic depth of all the paths are made identical. The 1 mm×0.6 mm multiplier core supports a throughput of 400 MHz and dissipates a total power of 0.6 W. We develop simple enhancements to the NPCPL building blocks that allow the multiplier to sustain throughputs in excess of 600 MHz. The methodology can be extended to introduce wave pipelining in other circuits as well  相似文献   

18.
介绍一种异步可重构结构,研究了异步可重构单元的设计。通过提前产生求值完成信号,使用DSDCVS逻辑实现可重构单元的运算电路,改进了异步可重构单元的控制电路。用三输入的C元件实现异步可重构单元的控制电路。仿真结果表明,异步可重构结构具有低功耗、高性能的优点,适合作为IP集成到系统芯片上,组成低功耗、高性能的可重构计算平台。  相似文献   

19.
20.
杨斐  黄军  康浩 《现代电子技术》2013,(22):143-146
利用EDA技术,在可编程逻辑器件CPLD上实现了一种多功能电子密码锁。为弥补传统密码锁的不足,进一步提高可靠性,该系统中所有数据的存储、运算都完全由硬件实现。利用VHDL语言对电路进行行为描述,QuartusⅡ软件中的EDA工具进行仿真及下载。整个设计过程采用自顶向下方案,设计效率高,开发成本低。采用了MAXⅡ系列的CPLD作为硬件核心,其功耗低,逻辑执行速度远高于单片机,在安防行业中有较强的市场竞争力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号