期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient Realization of Large Size Two’s Complement Multipliers Using Embedded Blocks in FPGAs

Shuli Gao Dhamin Al-Khalili Noureddine Chabini 《Circuits, Systems, and Signal Processing》2008,27(5):713-731

This paper presents an optimized design approach for two’s complement large size multipliers using smaller size embedded multiplier blocks available as resources in field programmable gate arrays (FPGAs). The realization is based on the Baugh–Wooley algorithm, which segments the multiplication into unsigned and signed components. To achieve efficient implementation results, a set of optimized schemes for the realization of the additions required for the unsigned multipliers and for combining the unsigned and signed components is proposed. The implementations of the multipliers have been carried out for operands with sizes ranging from 20 to 128 bits. The designs are synthesized and implemented on Xilinx’s Spartan-3 in the ISE 8.1 design platform and compared with three other realizations using the following approaches: (1) conventional sign-extension approach, (2) Xilinx’s IP-Core generator, and (3) sign-and-magnitude-based approach. The experimental results indicate that our proposed method outperforms the other techniques. 相似文献

2.

BZ-FAD: A Low-Power Low-Area Multiplier Based on Shift-and-Add Architecture

Mottaghi-Dastjerdi M. Afzali-Kusha A. Pedram M. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(2):302-306

In this paper, a low-power structure called bypass zero, feed A directly (BZ-FAD) for shift-and-add multipliers is proposed. The architecture considerably lowers the switching activity of conventional multipliers. The modifications to the multiplier which multiplies A by B include the removal of the shifting the B register, direct feeding of A to the adder, bypassing the adder whenever possible, using a ring counter instead of a binary counter and removal of the partial product shift. The architecture makes use of a low-power ring counter proposed in this work. Simulation results for 32-bit radix-2 multipliers show that the BZ-FAD architecture lowers the total switching activity up to 76% and power consumption up to 30% when compared to the conventional architecture. The proposed multiplier can be used for low-power applications where the speed is not a primary design parameter. 相似文献

3.

一种结构新颖的流水线Booth乘法器设计

李飞雄蒋林《电子科技》2013,26(8):46-48,67

在对传统Booth乘法器研究的基础上,介绍了一种结构新颖的流水线型布什(Booth)乘法器。使用基-4 Booth编码、华莱士树(Wallace Tree)压缩结构、64位Kogge-Stone前缀加法器实现,并在分段实现的64位Kogge-Stone前缀加法器中插入4级流水线寄存器,实现32 t×32 bit无符号和有符号数快速乘法。用硬件描述语言设计该乘法器,使用现场可编程门阵列(Field Programmable Gate Array,FPGA)进行验证,并采用SMIC 0.18 μm CMOS标准单元工艺对该乘法器进行综合。综合结果表明,电路的关键路径延时为3.6 ns,芯片面积＜0.134 mm,功耗＜32.69 mW。相似文献

4.

Reconfigurable Rounding Based Approximate Multiplier for Energy Efficient Multimedia Applications

Garg Bharat Patel Sujit 《Wireless Personal Communications》2021,118(2):919-931

The approximate design has emerged as a revolutionary design paradigm to obtain energy efficient digital signal processing cores while exhibiting acceptable accuracy. In different signal processing architectures, multiplier is the prime arithmetic unit and significantly influences the performance of these cores. Therefore, four novel energy efficient rounding based approximate (RBA) multiplier architectures are proposed in this paper. These multipliers first approximate input operands to the nearest power of two values and then achieve multiplication using few adders and shifters. The proposed RBA multipliers significantly reduce implementation complexity and provide higher energy efficiency. Further, a novel reconfigurable rounding based approximate (RRBA) multiplier is proposed to achieve desired performance-quality tradeoff. Further, the performance of proposed RBA and RRBA multipliers is evaluated and analysed over the existing approximate multiplier architectures. The proposed 8-bit RBA0 requires 59.8% (54.7%) reduced area (delay) compared to the existing approximate multiplier. Finally, the efficacy of the proposed multipliers is demonstrated in the application by implementing Gaussian filters embedded with existing and proposed approximate multipliers. The Gaussian filter designed using RBA0 provides 32.5% reduced energy consumption over the filter with existing multiplier.

相似文献

5.

Low‐Power and Low‐Hardware Bit‐Parallel Polynomial Basis Systolic Multiplier over GF(2m) for Irreducible Polynomials

下载免费PDF全文

Sudha Ellison Mathe Lakshmi Boppana 《ETRI Journal》2017,39(4):570-581

Multiplication in finite fields is used in many applications, especially in cryptography. It is a basic and the most computationally intensive operation from among all such operations. Several systolic multipliers are proposed in the literature that offer low hardware complexity or high speed. In this paper, a bit‐parallel polynomial basis systolic multiplier for generic irreducible polynomials is proposed based on a modified interleaved multiplication method. The hardware complexity and delay of the proposed multiplier are estimated, and a comparison with the corresponding multipliers available in the literature is presented. Of the corresponding multipliers, the proposed multiplier achieves a reduction in the hardware complexity of up to 20% when compared to the best multiplier for m = 163. The synthesis results of application‐specific integrated circuit and field‐programmable gate array implementations of the proposed multiplier are also presented. From the synthesis results, it is inferred that the proposed multiplier achieves low power consumption and low area complexitywhen compared to the best of the corresponding multipliers. 相似文献

6.

Multiplication Acceleration Through Twin Precision

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(9):1233-1246

We present the twin-precision technique for integer multipliers. The twin-precision technique can reduce the power dissipation by adapting a multiplier to the bitwidth of the operands being computed. The technique also enables an increased computational throughput, by allowing several narrow-width operations to be computed in parallel. We describe how to apply the twin-precision technique also to signed multiplier schemes, such as Baugh–Wooley and modified-Booth multipliers. It is shown that the twin-precision delay penalty is small (5%–10%) and that a significant reduction in power dissipation (40%–70%) can be achieved, when operating on narrow-width operands. In an application case study, we show that by extending the multiplier of a general-purpose processor with the twin-precision scheme, the execution time of a Fast Fourier Transform is reduced with 15% at a 14% reduction in datapath energy dissipation. All our evaluations are based on layout-extracted data from multipliers implemented in 130-nm and 65-nm commercial process technologies. 相似文献

7.

A Signed Array Multiplier with Bypassing Logic

Chua-Chin Wang Chia-Hao Hsu Gang-Neng Sung Yu-Cheng Lu 《Journal of Signal Processing Systems》2012,66(2):87-92

A low power digital signed array multiplier based on a 2-dimensional (2-D) bypassing technique is proposed in this work. When the horizontally (row) or the vertically (column) operand is zero, the corresponding bypassing cells skip redundant signal transitions to avoid unnecessary calculation to reduce power dissipation. An 8×8 signed multiplier using the 2-D bypassing technique is implemented on silicon using a standard 0.18 μm CMOS process to verify power reduction performance. The power-delay product of the proposed 8×8 signed array multiplier is measured to be 31.74 pJ at 166 MHz, which is significantly reduced in comparison with prior works. 相似文献

8.

Error-Efficient Approximate Multiplier Design using Rounding Based Approach for Image Smoothing Application

Rao E. Jagadeeswara Samundiswary P. 《Journal of Electronic Testing》2021,37(5-6):623-631

We propose a novel, error-efficient approximate multiplier (EEAM), which is based on a rounding-based approach (RBA). Multiplication is performed using rounding, shift, and add operations. We round the input operands to the nearest power of two using RBA. The modified inputs are processed by an arithmetic block (AB), which consists of addition, subtraction, and shifter blocks. The proposed approximate multiplier has input operands whose widths range from 8-bit to 32-bits. We simulated the proposed multiplier by using Vivado and MATLAB. The proposed multiplier is also synthesized using the Cadence RTL compiler, and compared to prior approximate multiplier proposals, EEAM’s delay and energy consumption are about of 22% and 57% better than the best known approximate multipliers. We also show that the proposed approximate multiplier’s worst-case error, mean error distance, mean relative error distance, and normalized error distance are about 3%, 44%, 45%, and 13% improvement over existing approximate multipliers. Finally, we used the proposed approximate multiplier in an image smoothing filter., For this application, we observed that our multiplier provides higher PSNR and SSIM than any prior approximate multiplier.

相似文献

9.

一种高速DSP中延迟优化的乘累加单元的设计与实现

下载免费PDF全文

Sheraz Anjum 陈杰李海军《电子器件》2007,30(4):1375-1379

乘累加单元是任何数字信号处理器(DSP)数据通路中的一个关键部分.多年来,硬件工程师们一直倾注于其优化与改进.本文描述了一种速度优化的乘累加单元的设计与实现.本文的乘累加单元是为一种高速VLIW结构的DSP核设计,能够进行16×16 40的无符号和带符号的二进制补码操作.在关键路径延迟上,本文的乘累加单元比其他任何使用相同或不同算数技术实现的乘累加单元都更优.本文的乘累加单元已成功使用于synopsys的工具,并与synopsys的Design Ware库中相同位宽的乘累加单元比较.比较结果表明,本文的乘累加单元比Design Ware库中的任何其他实现都要快,适合于在需要高吞吐率的DSP核中使用.注意:比较是在Design compiler中使用相同属性和开关下进行的. 相似文献

10.

High speed multiplier using Nikhilam Sutra algorithm of Vedic mathematics

Manoranjan Pradhan Rutuparna Panda 《International Journal of Electronics》2013,100(3):300-307

This article presents the design of a new high-speed multiplier architecture using Nikhilam Sutra of Vedic mathematics. The proposed multiplier architecture finds out the compliment of the large operand from its nearest base to perform the multiplication. The multiplication of two large operands is reduced to the multiplication of their compliments and addition. It is more efficient when the magnitudes of both operands are more than half of their maximum values. The carry save adder in the multiplier architecture increases the speed of addition of partial products. The multiplier circuit is synthesised and simulated using Xilinx ISE 10.1 software and implemented on Spartan 2 FPGA device XC2S30-5pq208. The output parameters such as propagation delay and device utilisation are calculated from synthesis results. The performance evaluation results in terms of speed and device utilisation are compared with earlier multiplier architecture. The proposed design has speed improvements compared to multiplier architecture presented in the literature. 相似文献

11.

Low power multipliers based on new hybrid full adders

Z. Abid 《Microelectronics Journal》2008,39(12):1509-1515

Five hybrid full adder designs are proposed for low power parallel multipliers. The new adders allow NAND gates to generate most of the multiplier partial product bits instead of AND gates, thereby lowering the power consumption and the total number of needed transistors. For an 8×8 implementation, the ALL-NAND array multiplier achieves 15.7% and 7.8% reduction in power consumption and transistor count at the cost of a 6.9% increase in time delay compared to standard array multiplier. The ALL-NAND tree multiplier exhibits lower power consumption and transistor count by 12.5% and 7.3%, respectively, with a 4.4% longer time delay, compared to conventional tree multiplier. 相似文献

12.

Hybrid low-latency serial-parallel multiplier architecture

Al-Besher B. Bouridane A. Ashur A.S. Crookes D. 《Electronics letters》1998,34(2):141-143

A novel low latency, most significant digit-first, signed digit multiplier architecture is presented. The design of the multiplier is based on a new 2 bit adder cell. Judicious deployment of latches in the circuit ensures that the multiplier operates on two coefficients of the multiplicand at the same time and produces one 2n digit product every 2n+3 cycles with an initial delay (latency) of three cycles. Comparison with existing multipliers has shown a superior performance of the proposed architecture 相似文献

13.

Design of a low-power, high performance, 8×8 bit multiplier using a Shannon-based adder cell

C. Senthilpari Ajay Kumar Singh K. Diwakar 《Microelectronics Journal》2008,39(5):812-821

In this paper, we have developed a new full-adder cell using multiplexing control input techniques (MCIT) for the sum operation and the Shannon-based technique to implement the carry. The proposed adder cell is applied to the design of several 8-bit array multipliers, namely a Braun array multiplier, a CSA multiplier, and Baugh–Wooley multipliers. The multiplier circuits are designed using DSCH2 VLSI CAD tools and their layouts are generated by Microwind 3 VLSI CAD tools. The output parameters such as propagation delay, total chip area, and power dissipation are calculated from the simulated results. We have also calculated energy per instruction (EPI), throughput, latency, signal-to-noise ratio (SNR), and the effect of temperature on the drain current by using the generated layout output parameter of a BSIM 4 advanced analyzer. The simulated results of the proposed adder-based multiplier circuit are compared with a cell multiplier that utilizes a MCIT-based adder, a cell multiplier composed of complementary pass transistor logic-based (CPL) adders and those of other published multipliers circuits. From the analysis of these simulated results, it was found that the proposed multiplier circuit gives better performance in terms of power, propagation delay, latency and throughput than other published results. 相似文献

14.

面向OFDM应用的低硬件开销低功耗64点FFT处理器设计

于建《电讯技术》2020,(3):338-343

在基于正交频分复用(Orthogonal Frequency Division Multiplexing,OFDM)的无线系统中,快速傅里叶变换(Fast Fourier Transform,FFT)作为关键模块,消耗着大量的硬件资源。为此,针对于IEEE802. 11a标准的无线局域网基带技术,提出了一种低硬件开销、低功耗的基-24算法流水线架构FFT处理器设计方案。在硬件实现上,采用单路延迟负反馈(Single-path Delay Feedback,SDF)流水线架构;为了降低硬件资源消耗,基于新型的改良蝶形架构利用正则有符号数(Canonical Signed Digit,CSD)常数乘法器替代布斯乘法器完成所有的复数乘法运算。设计采用QUARTUS PRIME工具进行开发,搭配Cyclone 10 LP系列器件,编译结果显示该方案与其他已存在的方案相比,至少节约硬件成本25%,降低功耗18%。相似文献

15.

Low-Power Multiplier Design Using a Bypassing Technique

Chua-Chin Wang Gang-Neng Sung 《Journal of Signal Processing Systems》2009,57(3):331-338

This paper presents a low power digital multiplier design by taking advantage of a 2-dimensional bypassing method. The proposed bypassing cells constituting the multiplier skip redundant signal transitions as well as computations when the horizontally partial product or the vertical operand is zero. Hence, it is a 2-dimensional bypassing method. Thorough post-layout simulations of a 8×8 digital multiplier using the proposed 2-dimensional bypassing method show that the power dissipation of the proposed design is reduced by more than 75% compared to prior designs. Physical measurements on silicon reveal that the proposed digital multiplier saves more than 28% even with pads’ power dissipation compared to the prior works. 相似文献

16.

基于改进的布斯算法FPGA嵌入式18×18乘法器 总被引：1，自引：1，他引：0

王鲁豫陈春深国磊《现代电子技术》2012,35(8):154-156

设计了一款嵌入FPGA的乘法器,该乘法器能够满足两个18b有符号或17b无符号数的乘法运算。该设计基于改进的布斯算法,提出了一种新的布斯译码和部分积结构,并对9-2压缩树和超前进位加法器进行了优化。该乘法器采用TSMC 0.18μm CMOS工艺,其关键路径延迟为3.46ns。相似文献

17.

基于FPGA乘法器架构的RNS与有符号二进制量转换 总被引：1，自引：1，他引：0

叶春张曦煌《微电子学与计算机》2005,22(11):148-150,153

RNS(余数数制系统)是一种整数运算系统,在粒度精确性,能源损耗和响应速度上有很大的优势.从RNS到二进制数的输入输出转换是基于余数算法的专用架构实现的关键.本文提出了一个基于N类模的RNS与有符号二进制量的通用转换算法在FPGAs的乘法器上的实现过程.该算法能更有效地进行有符号数与RNS的转换.基于该算法类型乘法器在同类型乘法器中显示出了速度优势.文章中该架构被映射到Altera的10K系列的FPGA上. 相似文献

18.

An 8.8-ns 54×54-bit multiplier with high speed redundantbinary architecture

Makino H. Nakase Y. Suzuki H. Morinaka H. Shinohara H. Mashiko K. 《Solid-State Circuits, IEEE Journal of》1996,31(6):773-783

A high speed redundant binary (RB) architecture, which is optimized for the fast CMOS parallel multiplier, is developed. This architecture enables one to convert a pair of partial products in normal binary (NB) form to one RE number with no additional circuit. We improved the RB adder (RBA) circuit so that it can make a fast addition of the RB partial products. We also simplified the converter circuit that converts the final RE number into the corresponding NE number. The carry propagation path of the converter circuit is carried out with only multiplexer circuits. A 54×54-bit multiplier is designed with this architecture. It is fabricated by 0.5 μm CMOS with triple level metal technology. The active area size is 3.0×3.08 mm² and the number of transistors is 78,800. This is the smallest number for all 54×54-bit multipliers ever reported. Under the condition of 3.3 V supply voltage, the chip achieves 8.8 ns multiplication time. The power dissipation of 540 mW is estimated for the operating frequency of 100 MHz. These are, so far, the fastest speed and the lowest power for 54×54-bit multipliers with 0.5-μm CMOS 相似文献

19.

基于部分积概率分析的高精度低功耗近似浮点乘法器设计

闫成刚赵轩徐宸宇陈珂葛际鹏王成华刘伟强《电子与信息学报》2023,45(1):87-95

浮点乘法器是高动态范围(HDR)图像处理、无线通信等系统中的关键运算单元,其相比于定点乘法器动态范围更广,但复杂度更高。近似计算作为一种新兴范式,在受限的精度损失范围内,可大幅降低硬件资源和功耗开销。该文提出一种16 bit半精度近似浮点乘法器(App-Fp-Mul),针对浮点乘法器中的尾数乘法模块,根据其部分积阵列中出现1的概率,提出一种对输入顺序不敏感的近似4-2压缩器及低位或门压缩方法,在精度损失较小的条件下有效降低了浮点乘法器资源及功耗。相较于精确设计,所提近似浮点乘法器在归一化平均错误距离(NMED)为0.0014时,面积及功耗延时积方面分别降低20%及58%;相较于现有近似设计,在近似位宽相同时具有更高的精度及更小的功耗延时积。最后将该文所提近似浮点乘法器应用于高动态范围图像处理,相比现有主流方案,峰值信噪比和结构相似性分别达到83.16 dB 和 99.9989%,取得了显著的提升。相似文献

20.

Minimization of switching activities of partial products for designing low-power multipliers

Chen O.T.-C. Sandy Wang Yi-Wen Wu 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(3):418-433

This work presents low-power 2's complement multipliers by minimizing the switching activities of partial products using the radix-4 Booth algorithm. Before computation for two input data, the one with a smaller effective dynamic range is processed to generate Booth codes, thereby increasing the probability that the partial products become zero. By employing the dynamic-range determination unit to control input data paths, the multiplier with a column-based adder tree of compressors or counters is designed. To further reduce power consumption, the two multipliers based on row-based and hybrid-based adder trees are realized with operations on effective dynamic ranges of input data. Functional blocks of these two multipliers can preserve their previous input states for noneffective dynamic data ranges and thus, reduce the number of their switching operations. To illustrate the proposed multipliers exhibiting low-power dissipation, the theoretical analyzes of switching activities of partial products are derived. The proposed 16 /spl times/ 16-bit multiplier with the column-based adder tree conserves more than 31.2%, 19.1%, and 33.0% of power consumed by the conventional multiplier, in applications of the ADPCM audio, G.723.1 speech, and wavelet-based image coders, respectively. Furthermore, the proposed multipliers with row-based, hybrid-based adder trees reduce power consumption by over 35.3%, 25.3% and 39.6%, and 33.4%, 24.9% and 36.9%, respectively. When considering product factors of hardware areas, critical delays and power consumption, the proposed multipliers can outperform the conventional multipliers. Consequently, the multipliers proposed herein can be broadly used in various media processing to yield low-power consumption at limited hardware cost or little slowing of speed. 相似文献