首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
In this paper, a low-power structure called bypass zero, feed A directly (BZ-FAD) for shift-and-add multipliers is proposed. The architecture considerably lowers the switching activity of conventional multipliers. The modifications to the multiplier which multiplies A by B include the removal of the shifting the B register, direct feeding of A to the adder, bypassing the adder whenever possible, using a ring counter instead of a binary counter and removal of the partial product shift. The architecture makes use of a low-power ring counter proposed in this work. Simulation results for 32-bit radix-2 multipliers show that the BZ-FAD architecture lowers the total switching activity up to 76% and power consumption up to 30% when compared to the conventional architecture. The proposed multiplier can be used for low-power applications where the speed is not a primary design parameter.  相似文献   

2.
To design a power-efficient digital signal processor, this study develops a fundamental arithmetic unit of a low-power adder that operates on effective dynamic data ranges. Before performing an addition operation, the effective dynamic ranges of two input data are determined. Based on a larger effective dynamic range, only selected functional blocks of the adder are activated to generate the desired result while the input bits of the unused functional blocks remain in their previous states. The added result is then recovered to match the required word length. Using this approach to reduce switching operations of noneffective bits allows input data in 2's complement and sign magnitude representations to have similar switching activities. This investigation thus proposes a 2's complement adder with two master-stage and slave-stage flip-flops, a dynamic-range determination unit and a sign-extension unit, owing to the easy implementation of addition and subtraction in such a system. Furthermore, this adder has a minimum number of transistors addressed by carry-in bits and thus is designed to reduce the power consumption of its unused functional blocks. The dynamic range and sign-extension units are explored in detail to minimize their circuit area and power consumption. Experimental results demonstrate that the proposed 32-bit adder can reduce power dissipation of conventional low-power adders for practical multimedia applications. Besides the ripple adder, the proposed approach can be utilized in other adder cells, such as carry lookahead and carry-select adders, to compromise complexity, speed and power consumption for application-specific integrated circuits and digital signal processors.  相似文献   

3.
随着大数据、云计算、物联网等技术的兴起,终端设备在硬件开销和供电方面面临巨大挑战,对于新型高效低功耗运算单元的需求日益迫切。针对运算单元功耗高的问题,提出了一种新型高效低功耗的近似Booth乘法器,可应用于图像处理、多媒体处理、模式识别等可容错应用领域。实验结果表明,与已有乘法器相比,所提出的近似Booth乘法器在功耗和延时方面分别降低了19.3%和28.6%,在面积方面节省了29.0%。同时,所提出的近似Booth乘法器的运算精度也具备一定的优势。最后,在高斯滤波的应用中验证了所提出的近似Booth乘法器的实用性。  相似文献   

4.
In this paper, we propose a partitioning and gating technique for the design of a high performance and low-power multiplier for kernel-based operations such as 2D convolution in video processing applications. The proposed technique reduces dynamic power consumption by analyzing the bit patterns in the input data to reduce switching activities. Special values of the pixels in the video streams such as zero, repeated values or repeated bit combinations are detected and data paths in the architecture design are disabled appropriately to eliminate unnecessary switching. Input pixels in the video stream are partitioned into halves to increase the possibility of detecting special values. It is observed that the proposed scheme helps to reduce dynamic power consumption in the 2D convolution operations up to 33%.  相似文献   

5.
This paper provides the experience of applying an advanced version of our former spurious power suppression technique (SPST) on multipliers for high-speed and low-power purposes. To filter out the useless switching power, there are two approaches, i.e., using registers and using and gates, to assert the data signals of multipliers after the data transition. The SPST has been applied on both the modified Booth decoder and the compression tree of multipliers to enlarge the power reduction. The simulation results show that the SPST implementation with AND gates owns an extremely high flexibility on adjusting the data asserting time which not only facilitates the robustness of SPST but also leads to a 40% speed improvement. Adopting a 0.18-mum CMOS technology, the proposed SPST-equipped multiplier dissipates only 0.0121 mW per MHz in H.264 texture coding applications, and obtains a 40% power reduction.  相似文献   

6.
67×67位乘法器的改进四阶Booth算法实现   总被引:1,自引:0,他引:1       下载免费PDF全文
针对67×67位乘法器,提出并实现新型的设计方法.先提出改进的四阶Booth算法,对乘数编码,以减少部分积的数目,提高压缩速度和减少面积,再研究优化和分配方法,对部分积和进位信号以及一个134位的补偿向量进行优化分配,并对部分积压缩,最后研究K-S加法器的改进方法,求和以实现134位乘积.采用TSMC的0.18 μm工艺库,Synopsys的Design compiler工具和Altera的Quautus4.2工具分析结果表明,基于本文方法实现的电路比DesignWare自带的乘法器实现的电路相比,性能总体占优.  相似文献   

7.
In this paper we present circuit techniques for CMOS low-power high-performance multiplier design. Novel full adder circuits were simulated and fabricated using 0.8-μm CMOS (in BiCMOS) technology. The complementary pass-transistor logic-transmission gate (CPL-TG) full adder implementation provided an energy savings of 50% compared to the conventional CMOS full adder. CPL implementation of the Booth encoder provided 30% power savings at 15% speed improvement compared to the static CMOS implementation. Although the circuits were optimized for (16×16)-b multiplier using the Booth algorithm, a (6×6)-b implementation was used as a test vehicle in order to reduce simulation time. For the (6×6)-b case, implementation based on CPL-TG resulted in 18% power savings and 30% speed improvement over conventional CMOS  相似文献   

8.
This paper describes the design of a 16×16 redundant binary multiplier for signed 2's complement numbers. The multiplier uses a new coding scheme for representing radix-2 signed digits. The coding results in a factor of two reduction in the number of summands used with respect to the modified Booth algorithm. The design has a small number of modular cells and regular routing, making it suitable for automatic synthesis of larger data-width multipliers. In addition, the row-based redundant binary adder tree is an ideal structure for high-throughput applications.This work was supported in part by National Science Foundation grant MIP-9019862.  相似文献   

9.
We address high-level synthesis of low-power digital signal processing (DSP) systems by using efficient switching activity models. We present a technology-independent hierarchical scheme that can be easily integrated into current communications/DSP CAD tools for comparing the relative power/performance of two competing DSP designs without specific knowledge of transistor-level details. The basic building blocks considered for such systems are a full adder, a half adder, and a one-bit delay. Estimates of the switching activity at the output of these primitives are used to model the activity in more complex building blocks of DSP systems. The presented hierarchical method is very fast and simple. The accuracy of estimates obtained using the proposed approach is shown to be within 4% of the results obtained using extensive bit-level simulations. Our approach shows that the choice of multiplier/multiplicand is important when using array multipliers in a datapath. If the input signal with smaller mean square value is chosen as the multiplicand, almost 20% savings in switching activity can be achieved. This observation is verified by an analog simulation using a 16 × 16 bit array multiplier implemented in a 0.6-μ process with 3.3 V supply voltage  相似文献   

10.
Five hybrid full adder designs are proposed for low power parallel multipliers. The new adders allow NAND gates to generate most of the multiplier partial product bits instead of AND gates, thereby lowering the power consumption and the total number of needed transistors. For an 8×8 implementation, the ALL-NAND array multiplier achieves 15.7% and 7.8% reduction in power consumption and transistor count at the cost of a 6.9% increase in time delay compared to standard array multiplier. The ALL-NAND tree multiplier exhibits lower power consumption and transistor count by 12.5% and 7.3%, respectively, with a 4.4% longer time delay, compared to conventional tree multiplier.  相似文献   

11.
提出了一种新的嵌入在FPGA中可重构的流水线乘法器设计.该设计采用了改进的波茨编码算法,可以实现18×18有符号乘法或17×17无符号乘法.还提出了一种新的电路优化方法来减少部分积的数目,并且提出了一种新的乘法器版图布局,以便适应tilebased FPGA芯片设计所加的约束.该乘法器可以配置成同步或异步模式,也町以配置成带流水线的模式以满足高频操作.该设计很容易扩展成不同的输入和输出位宽.同时提出了一种新的超前进位加法器电路来产生最后的结果.采用了传输门逻辑来实现整个乘法器.乘法器采用了中芯国际0.13μm CMOS工艺来实现,完成18×18的乘法操作需要4.1ns.全部使用2级的流水线时,时钟周期可以达到2.5ns.这比商用乘法器快29.1%,比其他乘法器快17.5%.与传统的基于查找表的乘法器相比,该乘法器的面积为传统乘法器面积的1/32.  相似文献   

12.
This paper presents a low power and high speed row bypassing multiplier. The primary power reductions are obtained by tuning off MOS components through multiplexers when the operands of multiplier are zero. Analysis of the conventional DSP applications shows that the average of zero input of operand in multiplier is 73.8 percent. Therefore, significant power consumption can be reduced by the proposed bypassing multiplier. The proposed multiplier adopts ripple-carry adder with fewer additional hardware components. In addition, the proposed bypassing architecture can enhance operating speed by the additional parallel architecture to shorten the delay time of the proposed multiplier. Both unsigned and signed operands of multiplier are developed. Post-layout simulations are performed with standard TSMC 0.18 μm CMOS technology and 1.8 V supply voltage by Cadence Spectre simulation tools. Simulation results show that the proposed design can reduce power consumption and operating speed compared to those of counterparts. For a 16×16 multiplier, the proposed design achieves 17 and 36 percent reduction in power consumption and delay, respectively, at the cost of 20 percent increase of chip area in comparison with those of conventional array multipliers. In addition, the proposed design achieves averages of 11 and 38 percent reduction in power consumption and delay with 46 percent less chip area in comparison with those counterparts for both unsigned and signed multipliers. The proposed design is suitable for low power and high speed arithmetic applications.  相似文献   

13.
李飞雄  蒋林 《电子科技》2013,26(8):46-48,67
在对传统Booth乘法器研究的基础上,介绍了一种结构新颖的流水线型布什(Booth)乘法器。使用基-4 Booth编码、华莱士树(Wallace Tree)压缩结构、64位Kogge-Stone前缀加法器实现,并在分段实现的64位Kogge-Stone前缀加法器中插入4级流水线寄存器,实现32 t×32 bit无符号和有符号数快速乘法。用硬件描述语言设计该乘法器,使用现场可编程门阵列(Field Programmable Gate Array,FPGA)进行验证,并采用SMIC 0.18 μm CMOS标准单元工艺对该乘法器进行综合。综合结果表明,电路的关键路径延时为3.6 ns,芯片面积<0.134 mm,功耗<32.69 mW。  相似文献   

14.
43位浮点流水线乘法器的设计   总被引:1,自引:0,他引:1       下载免费PDF全文
梁峰  邵志标  孙海珺   《电子器件》2006,29(4):1094-1096,1102
提出一种浮点流水线乘法器IP芯核。该乘法器采用改进的三阶Booth算法减少部分积数目,提出了一种压缩器混用的Wallace树结构压缩阵列,并对关键路径中的5-2压缩器、4—2压缩器和64位CLA加法器进行了优化设计,有效降低了乘法器的延时和面积。经FPGA仿真验证表明,该乘法器运算能力比Altera公司近期提供的同类乘法器单元快15.4%。  相似文献   

15.
浮点乘法器是高动态范围(HDR)图像处理、无线通信等系统中的关键运算单元,其相比于定点乘法器动态范围更广,但复杂度更高。近似计算作为一种新兴范式,在受限的精度损失范围内,可大幅降低硬件资源和功耗开销。该文提出一种16 bit半精度近似浮点乘法器(App-Fp-Mul),针对浮点乘法器中的尾数乘法模块,根据其部分积阵列中出现1的概率,提出一种对输入顺序不敏感的近似4-2压缩器及低位或门压缩方法,在精度损失较小的条件下有效降低了浮点乘法器资源及功耗。相较于精确设计,所提近似浮点乘法器在归一化平均错误距离(NMED)为0.0014时,面积及功耗延时积方面分别降低20%及58%;相较于现有近似设计,在近似位宽相同时具有更高的精度及更小的功耗延时积。最后将该文所提近似浮点乘法器应用于高动态范围图像处理,相比现有主流方案,峰值信噪比和结构相似性分别达到83.16 dB 和 99.9989%,取得了显著的提升。  相似文献   

16.
In this paper, a new architecture is proposed to reduce the area cost and power consumption of the decimal fixed-point multiplier. In the proposed sequential architecture, the partial product generation and selection cycles are reduced to one. Moreover, the elaborately selected easy multiples reduce the hardware requirement of the partial products selector. Subsequently, two partial products are accumulated with the iteration result in a redundant decimal format by a multi-operand redundant adder. The lower-significant half digits of the final product are iteratively converted in every cycle. On the other hand, the higher-significant half digits are converted by a carry-propagation adder in two extra cycles. After all, the area of the whole architecture is reduced significantly by not only the simpler partial product generation and accumulation architecture, but also the less registers. The synthesized result shows that the proposed sequential multiplier has a lower area cost and reasonable computation latency.  相似文献   

17.
Low-Power Constant-Coefficient Multiplier Generator   总被引:1,自引:0,他引:1  
Constant-coefficient multipliers are used in many DSP cores. A new low-power constant multiplier, with detailed design procedure, is presented. By using canonical sign-digit (CSD) number system, and introducing new simplification techniques and identities, the multiplier features a new algorithm to reduce logic depth for the Wallace-tree implementation. The method also reduces area and complexity.A generator written in C++ is used to generate technology-independent VHDL code of the constant multiplier for different input specifications. Synthesis results indicate the new design has smaller area and less power consumption while offering similar speed performance when compared with other multipliers.  相似文献   

18.
The use of signed-digit number systems in arithmetic circuits has the advantage of constant time addition. When signed-digit number systems are used in binary, they are referred as redundant binary. Here, we present a new encoding technique for generating redundant binary partial products for a multiplier, without using any additional hardware. We express each normal binary partial product in one's complement form, with an extra bit denoting the sign bit. The proposed redundant binary partial product generator (RBPPG) achieves the highest reduction in the number of partial products (75%) for a radix-4 multiplier. The carry-free nature of redundant binary adders is exploited to add the extra bits with the partial products, without using any extra adder stages. The new partial product generation (PPG) technique is shown to improve the speed of multipliers, with the least number of adder stages, irrespective of the multiplier size.  相似文献   

19.
Floating-point (FP) multipliers are the main energy consumers in many modern embedded digital signal processing (DSP) and multimedia systems. For lossy applications, minimizing the precision of FP multiplication operations under the acceptable accuracy loss is a well-known approach for reducing the energy consumption of FP multipliers. This paper proposes a multiple-precision FP multiplier to efficiently trade the energy consumption with the output quality. The proposed FP multiplier can perform low-precision multiplication that generates 8?, 14?, 20?, or 26-bit mantissa product through an iterative and truncated modified Booth multiplier. Energy saving for low-precision multiplication is achieved by partially suppressing the computation of mantissa multiplier. In addition, the proposed multiplier allows the bitwidth of mantissa in the multiplicand, multiplier, and output product to be dynamically changed when it performs different FP multiplication operations to further reduce energy consumption. Experimental results show that the proposed multiplier can achieve 59 %, 71 %, 73 %, and 82 % energy saving under 0.1 %, 1 %, 5 %, and 11 % accuracy loss, respectively, for the RGB-to-YUV & YUV-to-RGB conversion when compared to the conventional IEEE single-precision multiplier. In addition, the results also exhibit that the proposed multiplier can obtain more energy reduction than previous multiple-precision iterative FP multipliers.  相似文献   

20.
提出了一种基于静态分段补偿方法的近似乘法器。通过基于静态分段方法的Booth编码方法生成部分积阵列,并对生成的部分积阵列进行误差补偿优化以及近似压缩,以实现硬件性能和精度的折中。仿真结果显示,相比于综合工具生成的全精度乘法器,本设计在保持了较高精度水平的前提下,面积和功耗优化的比例达到了36.96%和35.95%。在图片边缘检测应用中,设计的峰值信噪比和结构相似性指标分别为26.10和98%,可见本设计在降低硬件资源消耗的同时,应用效果接近全精度乘法器。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号