首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到15条相似文献,搜索用时 703 毫秒
1.
介绍了一种适用于Viterbi解码器的异步ACS(加法器比较器选择器)的设计.它采用异步握手信号取代了同步电路中的整体时钟.给出了一种异步实现结构的异步加法单元、异步比较单元和异步选择单元电路.采用全定制设计方法设计了一个异步4 -bit ACS,并通过0 .6μm CMOS工艺进行投片验证.经过测试,芯片在工作电压5V,工作频率20MHz时的功耗为75. 5mW.由于采用异步控制,芯片在“睡眠”状态待机时不消耗动态功耗.芯片的平均响应时间为19 .18ns,仅为最差响应时间23 .37ns的82%.通过与相同工艺下的同步4 -bit ACS在功耗和性能方面仿真结果的比较,可见异步ACS较同步ACS具有优势.  相似文献   

2.
异步集成电路标准单元的设计与实现   总被引:1,自引:1,他引:0  
赵冰  仇玉林  黑勇   《电子器件》2005,28(2):346-348,351
设计异步集成电路时,常用的异步标准单元的分类、电路设计方法和电路结构.详细介绍了C单元和异步数据通路的设计与实现,提出了一种异步实现结构的异步加法单元、异步比较单元和异步选择单元电路.利用设计的异步标准单元构成了一个适用于Viterbi解码器的异步ACS(加法器一比较器一选择器),并通过0.6μmCMOS工艺进行投片验证.当芯片工作电压为5V,工作频率为20MHz时的功耗为75.5mW.芯片的平均响应时问为19.18DS,仅为最差响应时间23.37ns的82%.从而验证了异步标准单元的正确性和异步电路在性能方面较同步电路存在的优势.  相似文献   

3.
介绍一种新型异步 ACS(加法器 -比较器 -选择器 )的设计。一种异步实现结构的异步比较器 ,并通过异步加法单元、比较单元和选择单元的异步互连 ,构成了异步 ACS。在异步 ACS的性能分析时采用了一种基于多延迟模型的新方法 ,建立了异步加法器和比较器的多延迟模型 ,通过逻辑仿真 ,得到异步 ACS的平均响应时间为 3 .66ns,最长响应时间为 8.1 ns。由此可见 ,异步 ACS在性能方面较同步 ACS存在优势。  相似文献   

4.
本文设计了异步LDPC解码器运算通路,利用异步电路减少信号到达时间不一致引起的毛刺和时钟引起的功耗.利用输入数据的统计特性设计了运算通路中的主要运算单元,减少了冗余运算.本文还实现了同步运算通路和基于门控时钟的运算通路作为比较.三种设计采用相近的架构,在0.18μm CMOS工艺下实现相同的功能.仿真结果表明,提出的异步设计功耗最小,相比于同步设计和基于门控时钟设计,分别节省了42.0%和32.6%的功耗.虽然性能稍逊于同步设计,但优于门控时钟设计.其中,同步设计的延时是1.09ns,基于门控时钟的设计延时是1.61ns,而异步设计则是1.20ns.  相似文献   

5.
姜小波  叶德盛 《电子学报》2012,40(8):1650-1654
本文利用输入数据的统计特性,设计了两种低功耗异步比较器——异步行波比较器和提前终止异步比较器.异步行波比较器从第一个不相等的数位开始停止运算,但要把结果传到最低位,消耗部分功耗.提前终止异步比较器通过修改真值表,基于新的比较单元电路和终止判断电路,在第一个不相等的数位停止运算并立即输出比较结果,节省不必要的功耗.新设计的异步比较器和用于对比的同步比较器(BCL比较器和门控时钟比较器)均用SMIC0.18μm工艺实现.仿真结果表明,提前终止异步比较器功耗最低,与同步BCL比较器和门控时钟比较器相比,在随机数据和来自LDPC解码器的数据下,分别节省了87.1%、84.5%和37.5%、28.6%的功耗.  相似文献   

6.
赵冰  仇玉林  吕铁良  黑勇 《微电子学》2006,36(4):396-399
介绍一种采用异步实现结构的快速傅里叶变换处理器,该处理器的控制采用本地握手信号取代传统的系统时钟。给出了处理器中异步加法器的电路结构,设计了一个采用Booth译码Wallace tree结构的异步乘法器。通过对一个8点的异步快速傅里叶变换处理器进行电路仿真,得到该处理器完成一次变换的平均响应时间为31.15 ns,仅为最差响应时间42.85 ns的72.7%。可见,采用异步方式的快速傅里叶变换处理器在性能方面较同步处理器存在优势。  相似文献   

7.
树型仲裁器是异步电路中常用的电路,它的性能和鲁棒性对整个系统有很大的影响.针对以往树型仲裁器在设计和应用方面存在的问题,设计并实现了一种新型异步树型仲裁器,提高了异步树型仲裁器的鲁棒性.该仲裁器采用了插入差分电路和隔断两级逻辑电路的方法,避免了毛刺的出现.通过重新设计C单元,避免了现有树型仲裁器的死锁问题.在CSMC 0.5μmCMOS工艺下,该仲裁器的最短数据传输时间为4.37 ns,电路平均功耗为50.815 nW.  相似文献   

8.
设计了一种以Nauta跨导为单元结构的5阶切比雪夫跨导-电容带通滤波器及其调谐电路.该电路应用于低中频结构的北斗卫星导航接收机射频前端.滤波器的中心频率为4.092MHz,带宽设计为±2.046 MHz.该滤波器采用锁相环结构的片上自动频率调谐电路,用TSMC0.13 μm RF CMOS工艺实现,芯片面积仅为0.24 mm2,可以在低电压下工作,电路总功耗仅为1.68 mW.  相似文献   

9.
赵冰  仇玉林  吕铁良  黑勇   《电子器件》2006,29(3):613-616
针对一种异步实现结构的异步快速傅立叶变换处理器,给出了处理器中异步加法器的电路和异步乘法器的结构.该异步快速傅立叶变换处理器采用本地的握手信号代替了传统的整体时钟.通过对一个8点的异步快速傅立叶变换处理器电路仿真,得到该处理器的平均响应时间为31.15ns,仅为最差响应时间42.85ns的72.7%.由此可见,异步快速傅立叶变换处理器在性能方面较同步处理器存在优势。  相似文献   

10.
DES算法是典型的对称加密算法,广泛应用于商业化领域中,如各种智能卡等。对称算法的实现基本上都是基于同步电路设计。本文采用基于四相捆绑数据协议的异步电路方法实现了DES算法。将DES算法的同步电路设计和异步设计放到非接触智能卡芯片中,通过对芯片进行测试,对两种设计结果进行了全面的比较,从而表明用异步电路实现的对称算法在功耗和运算速度方面具有显著的优势。  相似文献   

11.
As technology evolves into the deep submicron level, synchronous circuit designs based on a single global clock have incurred problems in such areas as timing closure and power consumption. An asynchronous circuit design methodology is one of the strong candidates to solve such problems. To verify the feasibility and efficiency of a large‐scale asynchronous circuit, we design a fully clockless 32‐bit processor. We model the processor using an asynchronous HDL and synthesize it using a tool specialized for asynchronous circuits with a top‐down design approach. In this paper, two microarchitectures, basic and enhanced, are explored. The results from a pre‐layout simulation utilizing 0.13‐μm CMOS technology show that the performance and power consumption of the enhanced microarchitecture are respectively improved by 109% and 30% with respect to the basic architecture. Furthermore, the measured power efficiency is about 238 μW/MHz and is comparable to that of a synchronous counterpart.  相似文献   

12.
AMULET2e: an asynchronous embedded controller   总被引:5,自引:0,他引:5  
AMULET2e is an embedded system chip incorporating a 32-bit ARM-compatible asynchronous processor core, a 4-Kb pipelined cache, a flexible memory interface with dynamic bus sizing, and assorted programmable control functions. Many on-chip performance-enhancing and power-saving features are switchable, enabling detailed experimental analysis of their effectiveness. AMULET2e silicon demonstrates competitive performance and power efficiency, ease of system design, and it includes innovative features that exploit its asynchronous operation to advantage in applications that require low standby power and/or freedom from the electromagnetic interference generated by system clocks  相似文献   

13.
This paper presents a novel variable-latency multiplier architecture, suitable for implementation as a self-timed multiplier core or as a fully synchronous multicycle multiplier core. The architecture combines a second-order Booth algorithm with a split carry save array pipelined organization, incorporating multiple row skipping and completion-predicting carry-select dual adder. The paper reports the architecture and logic design, CMOS circuit design and performance evaluation. In 0.35 μm CMOS, the expected sustainable cycle time for a 32-bit synchronous implementation is 2.25 ns. Instruction level simulations estimate 54% single-cycle and 46% two-cycle operations in SPEC95 execution. Using the same CMOS process, the 32-bit asynchronous implementation is expected to reach an average 1.76 ns throughput and 3.48 ns latency in SPEC95 execution  相似文献   

14.
A 135K transistor, uniformly pipelined 50-MHz CMOS 64-bit floating-point arithmetic processor chip is described. The execution unit is capable of sustaining pipelined performance of one 32-bit or 64-bit result every 20 ns for all operations except double-precision multiply (40 ns) and divide. The chip employs an exponent difference prediction scheme and a unified leading-one and sticky-bit computation logic for the addition and subtraction operations. A hardware multiplier using a radix-8 modified Booth algorithm and a divider using a radix-2 SRT algorithm are employed.<>  相似文献   

15.
The floating-point unit (FPU) in the synergistic processor element (SPE) of a CELL processor is a fully pipelined 4-way single-instruction multiple-data (SIMD) unit designed to accelerate media and data streaming with 128-bit operands. It supports 32-bit single-precision floating-point and 16-bit integer operands with two different latencies, six-cycle and seven-cycle, with 11 FO4 delay per stage. The FPU optimizes the performance of critical single-precision multiply-add operations. Since exact rounding, exceptions, and de-norm number handling are not important to multimedia applications, IEEE correctness on the single-precision floating-point numbers is sacrificed for performance and simple design. It employs fine-grained clock gating for power saving. The design has 768K transistors in 1.3 mm/sup 2/, fabricated SOI in 90-nm technology. Correct operations have been observed up to 5.6 GHz with 1.4 V and 56/spl deg/C, delivering 44.8 GFlops. Architecture, logic, circuits, and integration are codesigned to meet the performance, power, and area goals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号