首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
VLSI designs are typically data-independent and as such, they must produce the correct result even for the worst-case inputs. Adders in particular assume that addition must be completed within prescribed number of clock cycles, independently of the operands. While the longest carry propagation of an n-bit adder is n bits, its expected length is only O(log2 n) bits. We present a novel dual-mode adder architecture that reduces the average energy consumption in up to 50%. In normal mode the adder targets the O(log2 n)-bit average worst-case carry propagation chains, while in extended mode it accommodates the less frequent O(n)-bit chain. We prove that minimum energy is achieved when the adder is designed for O(log2 n) carry propagation, and present a circuit implementation. Dual-mode adders enable voltage scaling of the entire system, potentially supporting further overall energy reduction. The energy-time tradeoff obtained when incorporating such adders in ordinary microprocessor’s pipeline and other architectures is discussed.  相似文献   

2.
吕晓兰 《测控技术》2014,33(2):127-129
针对目前存在的缩1码模2~n+1加法器的优缺点,设计出一个有效的基于进位选择的缩1码模2~n+1加法器。在模加法器的进位计算中,采用进位选择计算代替传统的进位计算,进位计算前缀运算量明显减少。分析和实验结果表明,对于比较大的n值,进位选择缩1码模2~n+1加法器在保持较高运算速度的前提下,有效地提高了集成度。  相似文献   

3.
We design a 3-bit adder or a radix-8 full adder (FA) in quantum-dot cellular automata (QCA), where the 3-bit carry propagation path can be accommodated in one clock-zone. To achieve this, we introduce group majority signals similar to group propagate and generate signals in parallel prefix computations, use them to reformulate the carry expressions of a previous radix-4 FA, and as such we could extend it to higher radix FAs. Applying the aforementioned new interpretation of carry expressions (via group majority signals) on 3-bit adders, results in that only a single clock cycle is required for 12-bit (vs. the previous 8-bit) carry propagation, across four radix-8 FAs. Based on the proposed radix-8 QCA-FA, we realized 8-, 16-, 32-, 64, and 128-bit QCA adders via QCADesigner. Comparison of these adders with the previous radix-4 experiment, showed 9–41% speed up, and 57–76% area saving, for 16–128-bit adders, respectively. On the other hand, compared to the best previous radix-2 design, for the same bit widths, we experienced 57–172% speed up, but at the cost of 138–4% area increase, except for the 64 and 128-bit cases, where we also experienced 19% and 41% area saving, respectively.  相似文献   

4.

Parallel FIR filter is the prime block of many modern communication application such as MIMO, multi-point transceivers etc. But hardware replication problem of parallel techniques make the system more bulky and costly. Fast FIR algorithm (FFA) gives the best alternative to traditional parallel techniques. In this paper, FFA based FIR structures with different topologies of multiplier and adder are implemented. To optimize design different multiplication technique like add and shift method, Vedic multiplier and booth multiplier are used for computation. Various adders such as carry select adder, carry save adder and Han-Carlson adder are analyzed for improved performance of the FFA structure. The basic objective is to investigate the performance of these designs for the tradeoffs between area, delay and power dissipation. Comparative study is carried out among conventional and different proposed designs. The advantage of presented work is that; based on the constraints, one can select the suitable design for specific application. It also fulfils the literature gap of critical analysis of FPGA implementation of FFA architecture using different multiplier and adder topologies. Xilinx Vivado HLS tool is used to implement the proposed designs in VHDL.

  相似文献   

5.
描述了一款适用于超长指令字数字信号处理器的64位加法器的设计。该加法器高度可重构,可以支持2个64位数据的加法运算、4个32位数据的加法运算、8个16位数据的加法运算以及16个8位数据的加法运算。它结合了Brent-Kung对数超前进位加法器和进位选择加法器的优点,使得加法器的面积和连线减少了50%,而延时与加法器的长度的对数成正比。仿真结果表明,在典型工作条件下,采用0.18μm工艺库标准单元,其关键路径的延时为0.83ns,面积为0.149mm2,功耗仅为0.315mW。  相似文献   

6.
设计一款适用于高性能数字信号处理器的16位加法器。该加法器结合条件进位选择和条件“和”选择加法器的特点,支持可重构,可以进行2个16位数据或者4个8位数据的加法运算,同时对其进位链进行优化。相对于传统的条件进位选择加法器,在典型工作条件下,采用0.18μm工艺库标准单元,其延时降低46%,功耗降低5%。  相似文献   

7.
子字并行加法器能够有效提高多媒体应用程序的处理性能。基于门延迟模型对加法器原理及性能进行了分析,设计了进位截断和进位消除两种子字并行控制机制。在这两种机制的指导下,实现了多种子字并行加法器,并对它们的性能进行了比较和分析。结果表明进位消除机制相对于进位截断机制需要较短的延时,较少的逻辑门数以及较低的功耗。在各种子字并行加法器中,Kogge-Stone加法器具有最少的延迟时间,RCA加法器具有最少的逻辑门数和最低的功耗。研究结果可以用于指导子字并行加法器的设计与选择。  相似文献   

8.
This paper focuses on a novel design of an adder/subtractor-based incrementer/decrementer using quantum-dot cellular automata (QCA) technology. QCA is a promising nanotechnology that offers new techniques of computation and data transmission. We use the multilayer crossover technique in the proposed designs to achieve low latency and area for the scalability feature. Moreover, new designs of QCA half and full adders are proposed to improve the operating speed of the incrementer/decrementer. The working of the proposed designs is analyzed via the QCA simulator tool, and the results are compared with previous studies in terms of cell count, area, and latency. According to the analysis, the presented designs perform well; for example, the proposed 4-bit incrementer design shows an improvement of 65 % in terms of area usage and 3.2 times lower latency compared to its existing counterpart.  相似文献   

9.
Lane of parallel through carry in ternary optical adder   总被引:7,自引:0,他引:7  
At the present 50 to 100 microseconds are necessary for a liquid crystal to change its state from opacity to clarity; 1.14×10-5 microseconds are however proved to be enough for light to pass through a clarity liquid crystal device. Rooted from this great difference in time, an optical adder was constructed with parallel through carry lanes (PTCL) composed of liquid crystals. Because all carries in PTCL process in parallel, the carry delay in the ternary optical computer's adder is avoided. Eliminating the carry delay in adder of ternary optical computer by physical means, the PTCL is also applicable for other types of optical adders. Moreover a light diagram of the adder and one PTCL structure are provided.  相似文献   

10.
Quantum dot Cellular Automata (QCA) is a transistor less technology alternative to CMOS for developing low-power, high speed digital circuits. Adder circuits are broadly employed in all digital computation systems. In this paper, a novel coplanar QCA full adder circuit is proposed which is designed with minimum number of QCA cells. The proposed full adder requires only 13 QCA cells, an area of 0.008 μm2 and delay of about 2 clock cycles to implement its function. Then an efficient 4-bit Ripple Carry Adder (RCA) is designed based on the proposed full adder that performs higher end addition in an effective way. Simulations results are obtained precisely using QCA designer tool version 2.0.3. Also the simulation results shows that the proposed 4-bit Ripple Carry Adder (RCA) requires only 70 QCA cells, an area of 0.18 μm2 and delay of about 5 clock cycles to implement its function with enhanced performance in terms of latency, area and QCA Cost. From the comparisons, it is found that our work achieves over 55% improvement in QCA cell count.  相似文献   

11.
The performance and power of error resilient applications will rise with a decrease in designing complexness due to approximate computing. This paper includes the new method for the approximation of multipliers. Variable likelihood terms are produced by the alteration of partial products of the multiplier. Based on the probability statistics, the accumulation of altered partial products leads to the variation of logic complexity. Here the estimate is implemented in 2 variables of 16-bit multiplier and in the final stage with reverse carry propagate adder(RCPA). The reverse carry propagate adder have carry signal propagation from the most significant bit(MSB) to the least significant bit(LSB), which results in greater relevance to the input carry than the output carry. The technique of carry circulation in reverse order with delay variations increases the stability. Utilizing the RCPA in approximate multiplier provide 21% and 7% improvements in area and delay. On comparing, this structure is resilient to delay variations than the ideal approximate adder.  相似文献   

12.
基于算术加法测试生成,提出了VLSI中加法器的一种自测试方案:加法器产生自身所需的所有测试矢量.通过优化测试矢量的初值改进这些测试矢量,提高了其故障侦查、定位能力.借助于测试矢量左移、逻辑与操作等方式对加法器自测试进行了设计.对8位、16位、32位行波、超前进位加法器的实验结果表明,该自测试能实现单、双固定型故障的完全测试,其单、双故障定位率分别达到了95.570%,72.656%以上.该自测试方案可实施真速测试且不会降低电路的原有性能,其测试时间与加法器长度无关.  相似文献   

13.
Inner product computation is an important operation, invoked repeatedly in matrix multiplications. A high-speed inner product processor can be very useful (among many possible applications) in real-time signal processing. This paper presents the design of a fast inner product processor, with appreciably reduced latency and cost. The inner product processor is implemented with a tree of carry-propagate or carry-save adders; this structure is obtained with the incorporation of three innovations in the conventional multiply/add tree: (1) The leaf-multipliers are expanded into adder subtrees, thus achieving an O(log Nb) latency, where N denotes the number of elements in a vector and b the number of bits in each element. (2) The partial products, to be summed in producing an inner product, are reordered according to their “minimum alignments.” This reordering brings approximately a 20% saving in hardware—including adders and data paths. The reduction in adder widths also yields savings in carry propagation time for carry-propagate adders. (3) For trees implemented with carry-save adders, the partial product reordering also serves to truncate the carry propagation chain in the final propagation stage by 2 log b − 1 positions, thus significantly reducing the latency further. A form of the Baugh and Wooley algorithm is adopted to implement two's complement notation with changes only in peripheral hardware.  相似文献   

14.
A new construction adder based on Chinese abacus algorithm is presented in this paper. There are two kinds of beads used in this construction. Each column element has three higher beads with a weight of four and three lower beads with a weight of one. The proposed 32-bit adder contains eight column elements. The construction was simulated by the technology of TSMC 0.18 μm CMOS process. Layout was also made by the same technology. The maximum delay of the 32-bit abacus adder is 0.91 ns and 14% less than that of Carry Look-ahead Adders for 0.18 μm technology. The power consumption of the abacus adder is 3.1 mW and 28% less than that of Carry Look-ahead Adders for 0.18 μm technology. Recent researches are compared with the proposed adder. The construction was also simulated by Predictive Technology Model. The PTM results also presented. The use of Chinese abacus approach offers a competitive technique with respect to other adders.  相似文献   

15.
The adders are the vital arithmetic operation for any arithmetic operations like multiplication, subtraction, and division. Binary number additions are performed by the digital circuit known as the adder. In VLSI (Very Large Scale Integration), the full adder is a basic component as it plays a major role in designing the integrated circuits applications. To minimize the power, various adder designs are implemented and each implemented designs undergo defined drawbacks. The designed adder requires high power when the driving capability is perfect and requires low power when the delay occurred is more. To overcome such issues and to obtain better performance, a novel parallel adder is proposed. The design of adder is initiated with 1 bit and has been extended up to 32 bits so as verify its scalability. This proposed novel parallel adder is attained from the carry look-ahead adder. The merits of this suggested adder are better speed, power consumption and delay, and the capability in driving. Thus designed adders are verified for different supply, delay, power, leakage and its performance is found to be superior to competitive Manchester Carry Chain Adder (MCCA), Carry Look Ahead Adder (CLAA), Carry Select Adder (CSLA), Carry Select Adder (CSA) and other adders.  相似文献   

16.
This paper takes up a remark in the well-known paper of Alon, Matias, and Szegedy (J. Comput. Syst. Sci. 58(1):137–147, 1999) about the computation of the frequency moments of data streams and shows in detail how any F k with k≥1 can be approximately computed using space O(km 1−1/k (k+log m+log log  n)) based on approximate counting. An important building block for this, which may be interesting in its own right, is a new approximate variant of reservoir sampling using space O(log log  n) for constant error parameters.  相似文献   

17.
In this paper, we study the problem of locating a median path of limited length on a tree under the condition that some existing facilities are already located. The existing facilities may be located at any subset of vertices. Upper and lower bounds are proposed for both the discrete and continuous models. In the discrete model, a median path is not allowed to contain partial edges. In the continuous model, a median path may contain partial edges. The proposed upper bounds for these two models are O(n log n) and O(n log (n)), respectively. They improve the previous known bounds from O(n log2 n) and O(n2), respectively. The proposed lower bounds are both Ω(n log n).  相似文献   

18.
We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2n), GF(3m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%.  相似文献   

19.
基于块匹配算法的运动估计是图像和视频应用中的关键技术。SAD运算是运动估计中最主要的运算形式,具有极高的计算复杂度和传输带宽需求。本文提出了一种可配置的SAD运算加速器结构,采用一个16×1规模的PE阵列和一个加法树结构加速SAD运算的执行。本文将PE阵列和加法树结构的流水线进行细致划分,有效提高了工作频率。加速器采用DMA事件机制,大部分的数据传输可以与SAD计算并行进行,减少了数据传输延迟引起的性能下降。实验结果显示,搜索16×16大小的搜索窗口,本文结构只需要4102个周期。基于SMIC0.13μm的CMOS标准单元工艺对本文结构进行综合,最高工作频率可达到750MHz,面积约为16.8k门和3.5KB的片上存储器。  相似文献   

20.
Mesh of trees (MOT) is well known for its small diameter, high bisection width, simple decomposability and area universality. On the other hand, OTIS (Optical Transpose Interconnection System) provides an efficient optoelectronic model for massively parallel processing system. In this paper, we present OTIS-MOT as a competent candidate for a two-tier architecture that can take the advantages of both the OTIS and the MOT. We show that an n4-n^{4}_{-} processor OTIS-MOT has diameter 8log n +1 (The base of the logarithm is assumed to be 2 throughout this paper.) and fault diameter 8log n+2 under single node failure. We establish other topological properties such as bisection width, multiple paths and the modularity. We show that many communication as well as application algorithms can run on this network in comparable time or even faster than other similar tree-based two-tier architectures. The communication algorithms including row/column-group broadcast and one-to-all broadcast are shown to require O(log n) time, multicast in O(n 2log n) time and the bit-reverse permutation in O(n) time. Many parallel algorithms for various problems such as finding polynomial zeros, sales forecasting, matrix-vector multiplication and the DFT computation are proposed to map in O(log n) time. Sorting and prefix computation are also shown to run in O(log n) time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号