首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Pipelining is a popularly used technique to achieve higher frequency of operation of digital signal processing (DSP) applications, by reducing the critical path of circuits. But conventionally critical path is estimated by the discrete component timing model in terms of the times required for the computation of additions and multiplications, where arithmetic circuits are considered as discrete components. Pipeline registers are inserted in between arithmetic circuits to reduce the estimated critical path. In this paper, we show that very often the architecture-level pipelining, based on the discrete component timing model, does not result in significant reduction in critical path, but on the other hand increases the latency and register complexity. In order to derive greater advantage of pipelining, propagation delays of different combinational sections could be evaluated precisely at gate level or at least at the level of one-bit adders, and based on that, the critical path could be reduced by placing the pipeline registers seamlessly across the combinational datapath without restricting them to be placed only in between arithmetic circuits. In this paper, we present adequately precise evaluation of propagation delays across combinational path as a network of arithmetic circuits based on seamless view of signal propagation. Using the precise information of propagation delay of combinational sections, we identify the best possible locations of pipeline registers in order to reduce the critical path up to the desired limit. The proposed seamless pipelining approach is found to achieve the desired acceleration of DSP applications without significant pipeline overhead in terms of latency and register complexity.  相似文献   

2.
The number of adders and critical paths in a multiplier block of a multiple constant multiplication based implementation of a finite impulse response (FIR) filter can be minimized through common subexpression elimination (CSE) techniques. A two‐bit common subexpression (CS) can be located recursively in a non‐canonic sign digit (CSD) representation of the filter coefficients. An efficient algorithm is presented in this paper to improve the elimination of a CS from the multiplier block of an FIR filter so that it can be realized with fewer adders and low logical depth as compared to the existing CSE methods in the literature. Vinod and others claimed the highest reduction in the number of logical operators (LOs) without increasing the logic depth (LD) requirement. Using the design examples given by Vinod and others, we compare the average reduction in LOs and LDs achieved by our algorithm. Our algorithm shows average LO improvements of 30.8%, 5.5%, and 22.5% with a comparative LD requirement over that of Vinod and others for three design examples. Improvement increases as the filter order increases, and for the highest filter order and lowest coefficient width, the LO improvements are 70.3%, 75.3%, and 72.2% for the three design examples.  相似文献   

3.
在适于采用内建自测方法进行可测性设计的电路中,累加器往往是一种被普遍采用的基本单元,如通用处理器和数字信号处理电路中的算术及逻辑运算电路。文章以Booth乘法器为例,介绍了利用累加器电路进行内建自测输出响应分析的几种常见形式,同时给出了相应的故障覆盖率、硬件开销和时延等方面的比较结果。  相似文献   

4.
This paper presents a new paradigm of design methodology to reduce the complexity of application-specific finite-impulse response (FIR) digital filters. A new adder graph data structure called the multiroot binary partition graph (MBPG) is proposed for the formulation of the multiple constant multiplication problem of FIR filter design. The set of coefficients in any fixed point representation is partitioned into symbols so that common subexpression identification and elimination become congruent to information parsing for data compression. A minimum number of different pairs or groups of symbols and residues can be used to code a set of coefficients based on their probability and conditional probability of occurrence. This ingenious concept enables the notion of entropy to be applied as a quantitative measure to evaluate the coding density of different compositions of symbols towards a set of coefficients. The minimal vertex set MBPG synthesized by our proposed information theoretic approach results in direct correspondences between the vertices and adders, and edges and physical interconnections. Unlike the common subexpression elimination algorithms based on other graph data structures, the symbol-level information carried in each vertex and the graph isomorphism of MBPG promise further fine-grain optimization in a reduced search space. One such optimization that has been exploited in this paper is the shift-inclusive computation reordering to minimize the width of every two's complement adder to further reduce the implementation cost and the critical path delay of the filter. Experiment results show that the proposed algorithm can contribute up to 19.30% reductions in logic complexity and up to 61.03% reduction in critical path delay over other minimization methods.   相似文献   

5.
We propose a common-subexpression-elimination (CSE) method for the synthesis of fixed-point finite-impulse response (FIR) filters. The proposed CSE algorithm considers both the redundancy among the canonic-signed-digit (CSD) filter coefficients and the length of the critical path in the multiplier block of a transposed-form FIR filter. Therefore, the proposed CSE method can perform tradeoff designs between complexity and the throughput rate. The number of adders synthesized by our method is commensurate with that by the graph-dependence algorithms. On the other hand, our method can synthesize a high-order complicated FIR filter in a few seconds.  相似文献   

6.
We propose a new matched filter architecture for chirp spread spectrum in IEEE 802.15.4a. By using relations among the four subchirps, the proposed architecture comprises four subfilters utilizing only a set of coefficients matched to the first subchirp. The four subfilters share adders and registers, and as a result, the required adders and registers for implementation are reduced.  相似文献   

7.
为提高长加法器的运算速度,扩展操作位数,提出了一种加法器结构--混合模块顶层进位级联超前进位加法器(TC2CLA).该结构将层数Mi>1的CLA模块底层进位级联改为顶层超前进位单元进位级联.在CLA单元电路优化和门电路标准延迟时间tpd的基础上,由进位关键路径推导出混合模块TC2CLA的模块延迟时间公式,阐明了公式中各项的意义.作为特例,导得了相同模块TC2CLA的模块延迟时间公式.并得出和证明了按模块层数递增级联序列是混合模块TC2CLA各序列中延迟时间最短、资源(面积)占用与功耗不变的速度优化序列.这一结论成为优化设计的一个设计规则.还给出了混合模块级联序列数的公式和应用实例.TC2CLA和CLA的延迟时间公式表明,在相同模块序列和不等待(组)生成、传输信号的条件下,最高位进位延迟时间及最高位和的最大延迟时间减小.  相似文献   

8.
利用硬件描述语言在ASIC上对FIR数字滤波器进行了设计和综合。利用子项空间技术有效地减少了多常系数乘法中加法器的个数,并通过限制加法器深度来进一步降低高速率约束条件下的实现难度。综合结果表明,该方法可以有效降低硬件的实现面积,适用于高吞吐率低功耗的数字系统设计。  相似文献   

9.
The efficient implementation of adders in differential logic can be carried out using a new generate signal (N) presented in this paper. This signal enables iterative shared transistor structures to be built with a better speed/area performance than a conventional implementation. It also allows adders developed in domino logic to be easily adapted to differential logic. Based on this signal, three 32-b adders in differential cascode switch voltage (DCVS) logic with completion circuit for applications in self-timed circuits have been fabricated in a standard 1.0-μm two-level metal CMOS technology. The adders are: a ripple-carry (RC) adder, a carry look-ahead (CLA) adder, and a binary carry look-ahead (BCL) adder. The RC adder has the best levels of performance for random input data, but its delay is significantly influenced by the length of the carry propagation path, and thus is not recommended in circuits with nonrandom input operands. The BCL adder is the fastest but has a high cost in chip area. The CLA adder provides an intermediate option, with an area which is 20% greater than that of the RC adder. Its average delay is slightly greater than that of the other two adders, with an addition time which increases slowly with the carry propagate length even for adders with a high number of bits  相似文献   

10.
This paper analyzes methods to minimize the power-delay product of 64-bit carry-select adders intended for high-performance and low-power applications. A first realization in 0.18-/spl mu/m partially depleted (PD) silicon-on-insulator (SOI), using complex branch-based logic (BBL) cells, results in a delay of 720 ps and a power dissipation of 96 mW at 1.5 V. The reduction of the stack height in the critical path, combined with the optimization of the global carry network with cell sharing and the selection of 8-bit pre-sums, leads to a reduction of the power-delay product by 75%. The automatic tuning of the transistor widths in 0.13-/spl mu/m PD SOI produces an energy-efficient 64-bit adder which has a delay of 326 ps and a power dissipation of 23 mW only at 1.1 V.  相似文献   

11.
Static Random Access Memory (SRAM) based Field Programmable Gate Array (FPGA) is widely applied in the field of aerospace,whose anti-SEU (Single Event Upset) capability becomes more and more important.To improve anti-FPGA SEU capability,the registers of the circuit netlist are tripled and divided into three categories in this study.By the packing algorithm,the registers of triple modular redundancy are loaded into different configurable logic block.At the same time,the packing algorithm considers the effect of large fan-out nets.The experimental results show that the algorithm successfully realize the packing of the register of Triple Modular Redundancy (TMR).Comparing with Timing Versatile PACKing (TVPACK),the algorithm in this study is able to obtain a 11% reduction of the number of the nets in critical path,and a 12% reduction of the time delay in critical path on average when TMR is not considered.Especially,some critical path delay of circuit can be improved about 33%.  相似文献   

12.
This paper proposes a BiCMOS wired-OR logic for high-speed multiple input logic gates. The logic utilizes the bipolar wired-OR to circumvent the use of a series connection of MOS transistors. The BiCMOS wired-OR logic was found to be the fastest compared with such conventional gates as CMOS NOR, BiCMOS multiemitter logic and CMOS wired-NOR logic, when the number of inputs was more than four and the supply voltage was 3.3 V. The BiCMOS wired-OR logic was also determined to be the fastest of the four when the fan-out number was below 20 and the number of inputs was eight. In addition, the speed was more than twice as faster when the fan-out number was less than 10. The BiCMOS wired-OR logic was applied to a 64-b 2-stage carry look-ahead adder, and was fabricated with a 0.5-μm BiCMOS process technology. A critical path delay time of 3.1 ns from an input to a sum output was obtained at the supply voltage of 3.3 V. This is 35% faster than that of conventional BiCMOS adders  相似文献   

13.
Conventional precise adders take long delay and large power consumption to obtain accurate results. Exploiting the error tolerance of some applications such as multimedia, image processing, and machine learning, a number of recent works proposed to design approximate adders that generate inaccurate results occasionally in exchange for reduction in delay and power consumption. However, most of the existing approximate adders have a large relative error. Besides, when applied to 2's complement signed addition, they sometimes generate a wrong sign bit. In this paper, we propose a novel approximate adder that exploits the generate signals for carry speculation. Furthermore, we introduce a low-overhead module to reduce the relative error and a sign correction module to fix the sign error. Compared to the conventional ripple carry adder and carry-lookahead adder, our adder with block size of 4 reduces power-delay product by 66% and 32%, respectively, for a 32-bit addition. Compared to the existing approximate adders, our adder significantly reduces the maximal relative error and ensures correct sign calculation with comparable area, delay, and power consumption. We further tested the performance of our adders with and without the sign error correction module in three real applications, mean filter, edge detection, and k-means clustering. The experimental results demonstrated the importance of reducing the relative error and ensuring the correct sign calculation for 2's complement signed additions. The outputs produced using our adder with the sign error correction module are very close to those produced using accurate adder.  相似文献   

14.
In this paper, we propose a systematic design methodology in the category of hybrid-CMOS logic style. A huge library of circuits appropriate for low-power and high-speed applications can be obtained by employing the proposed design methodology. The methodology is before used for designing XOR/XNOR and demonstrates the excellence of the new design features. The question of whether the method can be taken advantage to design the function of Carry and its complement (Carry and InverseCarry), as the third important module of a full adder, and what to extend the answer contributes to move towards the general systematic design. All the presented designs as before have high driving capability, balanced full-swing outputs with less glitches and small number of transistors. Also these only consist of one pass-transistor in the critical path, which causes low propagation delay and high drivability. As known, hybrid-CMOS full adders can be divided into three modules, e.g., SUM, Carry and XOR. Optimising these modules has reduced power consumption, delay and the number of transistors of full adders. Therefore by embedding the balanced full-swing circuits in carry module, it can be expected that 11 new full adder circuits will possess high performance. Simulation results show that the proposed circuits exhibit better performances compared to previously suggested circuits in the proposed realistic test bench. These circuits, outperform their counterparts, are showing 24–126% improvement in the power-delay product (PDP) and 57–82% improvement in the area. All simulations have been performed with TSMC 0.13-μm technology in new full adder test bench, using HSPISE to achieve the minimum PDP.  相似文献   

15.
该文基于快速卷积算法,提出一种适用于线性相位FIR滤波器的并行结构。该结构采用快速卷积算法减少子滤波器个数,同时让尽可能多的子滤波器具有对称系数,然后利用系数对称的特性减少子滤波器模块中的乘法器数量。对于具有对称系数的FIR滤波器,提出的并行结构能够比已有的并行FIR结构节省大量的硬件资源,尤其当滤波器的抽头数较大时效果更明显。具体地,对一个4并行144抽头的FIR滤波器,提出的结构比改进的快速FIR算法(Fast FIR Algorithm, FFA)结构节省36个乘法器(14.3%),23个加法器(6.6%)和35个延时单元(11.0%)。  相似文献   

16.
This article presents a method to map digit-recurrence arithmetic algorithms to lookup-table based Field Programmable Gate Arrays (FPGAs). By reducing the number of binary inputs to combinational logic and merging algorithm steps, the strategy creates new simplified functions to decrease logic depth and area. To illustrate this method, a radix-2 digit-recurrence division algorithm is mapped to the Xilinx XC4010, a lookup-table based FPGA. The mapping develops a linear sequential array design that avoids the common problem of large fanout delay in the critical path. This approach has a cycle time independent of precision while requiring approximately the same number of logic blocks as a conventional design.  相似文献   

17.
超前进位加法器混合模块延迟公式及优化序列   总被引:2,自引:2,他引:2  
为扩展操作位数提出了一种更具普遍性的长加法器结构——混合模块级联超前进位加法器。在超前进位加法器(CLA)单元电路优化和门电路标准延迟模型的基础上,由进位关键路径推导出混合模块级联CLA的模块延迟时间公式,阐明了公式中各项的意义。作为特例,自然地导出了相同模块级联CLA的模块延迟时间公式。并得出和证明了按模块层数递增级联序列是混合模块级联CLA各序列中延迟时间最短、资源(面积)占用与功耗不变的速度优化序列。这一结论成为优化设计的一个设计规则。还给出了级联序列数的公式和应用实例。  相似文献   

18.
Emerging trends in the area of digital very large scale integration (VLSI) signal processing can lead to a reduction in the cost of the cochlear implant. Digital signal processing algorithms are repetitively used in speech processors for filtering and encoding operations. The critical paths in these algorithms limit the performance of the speech processors. These algorithms must be transformed to accommodate processors designed to be high speed and have less area and low power. This can be realized by basing the design of the auditory filter banks for the processors on digital VLSI signal processing concepts. By applying a folding algorithm to the second‐order digital gammatone filter (GTF), the number of multipliers is reduced from five to one and the number of adders is reduced from three to one, without changing the characteristics of the filter. Folded second‐order filter sections are cascaded with three similar structures to realize the eighth‐order digital GTF whose response is a close match to the human cochlea response. The silicon area is reduced from twenty to four multipliers and from twelve to four adders by using the folding architecture.  相似文献   

19.
对数跳跃加法器的算法及结构设计   总被引:5,自引:0,他引:5  
贾嵩  刘飞  刘凌  陈中建  吉利久 《电子学报》2003,31(8):1186-1189
本文介绍一种新型加法器结构——对数跳跃加法器,该结构结合进位跳跃加法器和树形超前进位加法器算法,将跳跃进位分组内的进位链改成二叉树形超前进位结构,组内的路径延迟同操作数长度呈对数关系,因而结合了传统进位跳跃结构面积小、功耗低的特点和ELM树形CLA在速度方面的优势.在结构设计中应用Ling's算法设计进位结合结构,在不增加关键路径延迟的前提下,将初始进位嵌入到进位链.32位对数跳跃加法器的最大扇出为5,关键路径为8级逻辑门延迟,结构规整,易于集成.spectre电路仿真结果表明,在0.25μmCMOS工艺下,32位加法器的关键路径延迟为760ps,100MHz工作频率下功耗为5.2mW.  相似文献   

20.
This paper describes improvements to the parallel prefix adder designs and optimization algorithms of Chan, Oklobdzija, Schlag, Thomborson and Wei. Our “direct feeding” (DF) adder design avoids large signal fanouts along critical adder paths. Our “random pruning” heuristic limits the time and space required to find near-optimal DF adders, so that the search process runs in a few minutes on a Sun-4 workstation. Our improved carry lookahead adders are well suited for static CMOS implementation; our improvements may be applied to other parallel prefix CMOS circuits. Simulations with Mentor Graphics' Lsim indicate that our best DIP adders are 12% to 20% faster than the carry lookahead adders presented by Chan et al.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号