首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 187 毫秒
1.
基于部分积优化的高速并行乘法器实现   总被引:1,自引:1,他引:0  
提出了部分积产生与压缩单元的改进结构,通过对部分积产生算法进行优化,采用选择器结构来替换传统的与或门,提高了部分积电路的性能,并降低了该模块的面积与功耗.对压缩单元的优化提高了部分积压缩的速度.对16×16并行乘法器综合验证表明,改进的乘法器性能提高14.5%,面积减少7.1%,同时功耗下降17.2%.  相似文献   

2.
为了减少乘法指令在保留站中的等待时间,设计了一款32位流水线型乘法器,该乘法器将应用于作者设计的一款超标量处理器中.该乘法器应用了改进型的booth编码算法,对部分积生成电路进行了优化,并采用了4-2压缩器与3-2压缩器相结合的Wallace树型结构对部分积进行压缩,最后再根据各级的延迟,在电路中插入了流水线寄存器,使其运算速度得到了提高.该乘法器使用GSMC 0.18μm工艺进行综合.经过仿真验证,该乘法器大大减少了在保留站中等待执行的乘法指令的完成时间,使每个时钟周期都有一条新的乘法指令被发送至乘法器进行运算.  相似文献   

3.
为了使基于FPGA设计的信号处理系统具有更高运行速度和具有更优化的电路版图布局布线,提出了一种适用于FPGA结构的改进型WALLACE TREE架构乘法器。首先讨论了基于标准单元3∶2压缩器的改进型6∶4压缩器,根据FPGA中slice的结构特点通过在FPGA Editer软件工具编辑,对该压缩器进行逻辑优化,将其应用于FPGA的基本单元slice结构中。并对乘法器的其他部分结构优化整合,实现一个资源和性能达到合理平衡,且易于在FPGA中实现的乘法器。实际运行结果表明,该乘法器的关键路径延时小于8.4 ns,使乘法器时钟频率和系统性能都得到很大提高。  相似文献   

4.
利用阵列乘法器中的压缩部分积的思想,通过对传统的串行执行乘法器的改造,提出了一种带压缩器的串行执行浮点乘法器,分析了具有不同压缩模块结构的乘法器的性能.实验表明,该乘法器可以有效地提高传统的串行乘法器的性能,而面积要小于阵列乘法器.  相似文献   

5.
针对32位RISC-V"蜂鸟E203"处理器的乘法器部分积压缩延时较大的问题,该文改进5-2压缩器,提出一种由新型5-2压缩器和4-2压缩器相结合的Wallace树形压缩结构,压缩基4 Booth编码产生的部分积,提高部分积压缩的压缩效率,优化设计出一种改进的32位有/无符号乘法器,减少乘法指令执行周期和乘法器关键路径...  相似文献   

6.
介绍了一种用于指纹识别专用集成电路(ASIC)的乘法器模块的设计.该乘法器模块能够处理32位的有符号数、无符号数的乘法和乘加运算.电路采用基-4的Booth编码以及改进型压缩器阵列结构.采用提出的迭代和阵列结合的结构算法,可节省芯片面积30%,提高工作频率24%.模块电路在TSMC 0.25 μm工艺上实现.该乘法器模块易于移植到其他数字处理系统.  相似文献   

7.
基于CTGAL电路的绝热4-2压缩器和乘法器设计   总被引:1,自引:1,他引:0  
通过对并行乘法器和钟控传输门绝热逻辑(Clocked Transmission Gate Adiabatic Logic,CTGAL)电路工作原理及结构的研究,提出了基于CTGAL电路的绝热4-2压缩器的设计方案,与传统CMOS逻辑的4-2压缩器相比,此压缩器节省平均功耗约87%.在此基础上,进一步设计了4×4位绝热乘法器,HSPICE模拟结果表明了所设计的电路具有正确的逻辑功能和显著的能量恢复特性.  相似文献   

8.
张柳  崔晓平  董文雯 《电子学报》2018,46(6):1519-1523
商业计算、金融分析等领域对高精度计算的需求对硬件十进制运算提出了越来越高的要求.已有的全冗余十进制乘法器由于全冗余加法器的结构复杂,已经给其性能的提升造成了瓶颈.本文优化设计了基于超载十进制数集(Overloaded Decimal Digit Set,ODDS)的全冗余ODDS加法器以降低其复杂度,并设计了一种新的基于该加法器的十进制压缩树模块.本文在部分积产生模块采用有符号的基-10编码和冗余的二-十进制(Binary Coded Decimal,BCD)编码快速产生十进制部分积.在最终积产生模块采用优化的编码转换电路快速产生BCD-8421乘积.实验结果显示所设计的并行全冗余十进制乘法器速度较快、面积较小.  相似文献   

9.
基于改进的混合压缩结构的Wallace树设计   总被引:1,自引:0,他引:1  
文章针对典型的32位浮点乘法器,对Booth算法产生的部分积重新分组,采用CSA和4-2压缩器的混合电路结构,对传统的Wallace树型乘法器进行改进,并提出一种高速的树型乘法器阵列结构。该结构与传统的Wallace树型相比,具有更小的延时、更规整的布局布线,使其更易于VLSI实现。  相似文献   

10.
王江  黄秀荪  陈刚  杨旭光  仇玉林   《电子器件》2007,30(1):162-166
定点尾数乘除法器是相应32位浮点运算的核心部件,针对工控应用,本文采用半定制方法完成了设计并且采用TSMC0.18微米工艺实现.乘法器采用基4Booth编码,通过对符号位、隐含位的处理减少了部分积的生成,并在Wallace树求和过程中,引入4∶2压缩器,加快了求和速度.除法器采用改进的SRT算法,引入商位猜测、部分余并行计算、商位修正值选择电路.乘除法器均采用了进位保留加法器提高运算速度.后端物理实现表明,乘除法器的频率分别可到227 MHz,305 MHz,整体设计具有简洁、快速、计算准确的特征.  相似文献   

11.
3:2 counters and 4:2 compressors have been widely used for multiplier implementations. In this paper, a fast 5:3 compressor is derived for high-speed multiplier implementations. The fast 5:3 compression is obtained by applying two rows of fast 2-bit adder cells to five rows in a partial product matrix. As a design example, a 16-bit by 16-bit MAC (Multiply and Accumulate) design is investigated both in a purely logical gate implementation and in a highly customized design. For the partial product reduction, the use of the new 5:3 compression leads to 14.3% speed improvement in terms of XOR gate delay. In a dynamic CMOS circuit implementation using 0.225 m bulk CMOS technology, 11.7% speed improvement is observed with 8.1% less power consumption for the reduction tree.  相似文献   

12.
This paper presents several architectures and designs of low-power 4-2 and 5-2 compressors capable of operating at ultra low supply voltages. These compressor architectures are anatomized into their constituent modules and different static logic styles based on the same deep submicrometer CMOS process model are used to realize them. Different configurations of each architecture, which include a number of novel 4-2 and 5-2 compressor designs, are prototyped and simulated to evaluate their performance in speed, power dissipation and power-delay product. The newly developed circuits are based on various configurations of the novel 5-2 compressor architecture with the new carry generator circuit, or existing architectures configured with the proposed circuit for the exclusive OR (XOR) and exclusive NOR ( XNOR) [XOR-XNOR] module. The proposed new circuit for the XOR-XNOR module eliminates the weak logic on the internal nodes of pass transistors with a pair of feedback PMOS-NMOS transistors. Driving capability has been considered in the design as well as in the simulation setup so that these 4-2 and 5-2 compressor cells can operate reliably in any tree structured parallel multiplier at very low supply voltages. Two new simulation environments are created to ensure that the performances reflect the realistic circuit operation in the system to which these cells are integrated. Simulation results show that the 4-2 compressor with the proposed XOR-XNOR module and the new fast 5-2 compressor architecture are able to function at supply voltage as low as 0.6 V, and outperform many other architectures including the classical CMOS logic compressors and variants of compressors constructed with various combinations of recently reported superior low-power logic cells.  相似文献   

13.
一种3级流水线Wallace树压缩器的硬件设计   总被引:3,自引:0,他引:3  
本文提出了一种针对32位浮点乘法运算的三级流水线wallace树压缩器。首先设计出4-2和3-2压缩器,然后由其构成wallace树结构的压缩器,在部分积整个压缩过程中,采用三级流水线,大大提高了浮点运算中尾数处理的速度。该压缩器采用了模块化设计,并用VHDL进行了描述,使用了modelsimXEIl5.6a仿真软件进行了波形仿真,并用synplify/synplify pro综合工具比较了由两种不同4-2压缩单元所构成的wallace树压缩器的综合结果,选出最佳的一种。此压缩器已作为一个压缩模块,用在32位浮点乘法器的软核设计中,得到了很好的结果。  相似文献   

14.
The use of signed-digit number systems in arithmetic circuits has the advantage of constant time addition. When signed-digit number systems are used in binary, they are referred as redundant binary. Here, we present a new encoding technique for generating redundant binary partial products for a multiplier, without using any additional hardware. We express each normal binary partial product in one's complement form, with an extra bit denoting the sign bit. The proposed redundant binary partial product generator (RBPPG) achieves the highest reduction in the number of partial products (75%) for a radix-4 multiplier. The carry-free nature of redundant binary adders is exploited to add the extra bits with the partial products, without using any extra adder stages. The new partial product generation (PPG) technique is shown to improve the speed of multipliers, with the least number of adder stages, irrespective of the multiplier size.  相似文献   

15.
Parallel multiplier is one of the most important building blocks in all the DSP processors, which needs faster computations. To reduce the total transistor count in a multiplier we have proposed two new approaches. The first approach is using a 26 transistor booth encoder and a 8-transistor/partial-product booth selector to generate partial products. The second approach proposes a new circuit for 4 : 2 compressors. The booth encoder and booth selector reported here are the smallest in transistor count, but comparable to the best delay with less power consumption. This paper describes a comparison of a compact 16 × 16 parallel multiplier using the new circuit components. This shows a transistor count advantage of 27% and 52% in partial product generation and partial product accumulation, respectively.  相似文献   

16.
A 3-2 counter and a 4-2 compressor are the basic components in the partial product summation tree of a parallel array multiplier. A new high-speed and low power design of these components is presented. Owing to the reduction of the internal load capacitance, the counter and compressor have better speed and power performance than other recently proposed approaches  相似文献   

17.
浮点乘法器是高动态范围(HDR)图像处理、无线通信等系统中的关键运算单元,其相比于定点乘法器动态范围更广,但复杂度更高。近似计算作为一种新兴范式,在受限的精度损失范围内,可大幅降低硬件资源和功耗开销。该文提出一种16 bit半精度近似浮点乘法器(App-Fp-Mul),针对浮点乘法器中的尾数乘法模块,根据其部分积阵列中出现1的概率,提出一种对输入顺序不敏感的近似4-2压缩器及低位或门压缩方法,在精度损失较小的条件下有效降低了浮点乘法器资源及功耗。相较于精确设计,所提近似浮点乘法器在归一化平均错误距离(NMED)为0.0014时,面积及功耗延时积方面分别降低20%及58%;相较于现有近似设计,在近似位宽相同时具有更高的精度及更小的功耗延时积。最后将该文所提近似浮点乘法器应用于高动态范围图像处理,相比现有主流方案,峰值信噪比和结构相似性分别达到83.16 dB 和 99.9989%,取得了显著的提升。  相似文献   

18.
随着云计算、物联网和人工智能等技术的快速发展,终端设备在硬件资源和能耗上面临巨大挑战。为了降低运算单元的功耗,文章提出了两种基于新型4-1压缩器的低功耗近似乘法器。通过分析4-1压缩器的误差,设计了误差补偿单元并应用在乘法器中,降低了近似乘法器的精度损失。仿真结果显示,与精确乘法器相比,提出的两种8位无符号数近似乘法器在延时上分别降低了5.67%和18.23%,在面积上分别降低了6.54%和20.36%,在功耗上分别降低了15.83%和30.94%。最后,在图像锐化实验中,提出的设计表现优秀,验证了其在可容错应用中的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号