首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 205 毫秒
1.
为了提高乘法器性能,采用基4 Booth编码算法设计Booth编码器,使用华莱士树压缩结构设计16 bit有符号数乘法器;针对部分积生成的复杂过程提出一种新的部分积生成器,同时进行部分积的产生与选择,提高了部分积生成效率;针对压缩过程中的资源浪费,提出一种部分积提前压缩器,将某几位部分积在进入压缩树之前进行合并,减少了压缩单元的使用。基于28 nm工艺对乘法器进行逻辑综合,关键路径延时为0.77 ns,总面积为937.3μm2,功耗为935.71μW,能够较好地提升乘法器的面积利用率和运算性能。  相似文献   

2.
基于优化电路的高性能乘法器设计   总被引:1,自引:1,他引:0  
为了提高二进制乘法器的速度并降低其功耗,在乘法器的部分积产生模块采用了改进的基4Booth编码和部分积产生电路并在部分积压缩模块应用了7∶3压缩器电路,设计并实现了一种高性能的33×28二进制乘法器.在TSMC 90 nm工艺和0.9 V工作电压下,仿真结果与Synopsys公司module compiler生成的乘法器相比,部分积产生电路速度提高34%,7∶3压缩器和其他压缩器的结合使用减少了约一级全加器的延时,整体乘法器速度提高约17.7%.  相似文献   

3.
利用阵列乘法器中的压缩部分积的思想,通过对传统的串行执行乘法器的改造,提出了一种带压缩器的串行执行浮点乘法器,分析了具有不同压缩模块结构的乘法器的性能.实验表明,该乘法器可以有效地提高传统的串行乘法器的性能,而面积要小于阵列乘法器.  相似文献   

4.
介绍了一种可嵌入微控制器的8位乘法器的设计.采用基4 Booth算法产生部分积,用一种改进的压缩阵列结构压缩部分积;同时,采用一种减少符号扩展的技术,优化压缩结构的面积,最终对压缩的数据采用超前进位加法器求和电路得到乘积.整个设计采用Verilog HDL进行结构级描述,基于SMIC 0.18 μm标准单元库,由Synopsys的DC进行逻辑综合.结果显示,设计的乘法器电路时间延迟为5.31 ns,系统时钟频率达188 MHz.  相似文献   

5.
提出了一种基于静态分段补偿方法的近似乘法器。通过基于静态分段方法的Booth编码方法生成部分积阵列,并对生成的部分积阵列进行误差补偿优化以及近似压缩,以实现硬件性能和精度的折中。仿真结果显示,相比于综合工具生成的全精度乘法器,本设计在保持了较高精度水平的前提下,面积和功耗优化的比例达到了36.96%和35.95%。在图片边缘检测应用中,设计的峰值信噪比和结构相似性指标分别为26.10和98%,可见本设计在降低硬件资源消耗的同时,应用效果接近全精度乘法器。  相似文献   

6.
为了进一步降低乘法器运算过程中的延迟,减少功耗,在行旁路乘法器的基础上进一步优化,提出一种并行行旁路(PRB)乘法器,并用有限状态机进行了实现.在行旁路的基础上,通过对乘数进行重新编码并行输出部分积,使乘法运算中产生的部分积数量减少,提高运算速度;利用有限状态机实现PRB乘法器,有效减少了电路中逻辑元件的数量,降低了功耗.在Quartus平台上进行的仿真表明PRB乘法器在整体性能上有较大的改善.  相似文献   

7.
浮点乘法器是高动态范围(HDR)图像处理、无线通信等系统中的关键运算单元,其相比于定点乘法器动态范围更广,但复杂度更高。近似计算作为一种新兴范式,在受限的精度损失范围内,可大幅降低硬件资源和功耗开销。该文提出一种16 bit半精度近似浮点乘法器(App-Fp-Mul),针对浮点乘法器中的尾数乘法模块,根据其部分积阵列中出现1的概率,提出一种对输入顺序不敏感的近似4-2压缩器及低位或门压缩方法,在精度损失较小的条件下有效降低了浮点乘法器资源及功耗。相较于精确设计,所提近似浮点乘法器在归一化平均错误距离(NMED)为0.0014时,面积及功耗延时积方面分别降低20%及58%;相较于现有近似设计,在近似位宽相同时具有更高的精度及更小的功耗延时积。最后将该文所提近似浮点乘法器应用于高动态范围图像处理,相比现有主流方案,峰值信噪比和结构相似性分别达到83.16 dB 和 99.9989%,取得了显著的提升。  相似文献   

8.
针对32位RISC-V"蜂鸟E203"处理器的乘法器部分积压缩延时较大的问题,该文改进5-2压缩器,提出一种由新型5-2压缩器和4-2压缩器相结合的Wallace树形压缩结构,压缩基4 Booth编码产生的部分积,提高部分积压缩的压缩效率,优化设计出一种改进的32位有/无符号乘法器,减少乘法指令执行周期和乘法器关键路径...  相似文献   

9.
32×32高速乘法器的设计与实现   总被引:3,自引:2,他引:1  
设计并实现了一种32×32高速乘法器.本设计通过改进的基4 Booth编码产生部分积,用一种改进的Wallace树结构压缩部分积,同时采用一种防止符号扩展的技术有效地减小了压缩结构的面积.整个设计采用Vetilog HDL进行了结构级描述,用SIMC 0.18μm标准单元库进行逻辑综合.时间延迟为4.34 ns,系统时钟频率可达230 MHz.  相似文献   

10.
随着物联网的快速发展,智能终端设备在硬件资源和供电上受到较强限制,迫切需要低功耗的新型运算单元。针对运算单元功耗高的问题,提出了一种基于近似压缩器的低功耗近似乘法器,用于图像处理、深度学习等可容错应用领域。实验结果表明,相比于现有近似乘法器,该近似乘法器降低了30.70%的功耗和26.50%的延迟,节省了30.23%的芯片面积,在功耗延迟积(PDP)和能量延迟积(EDP)方面均优化了43%以上。在计算精度方面同样具有一定优势。最后,在图像滤波应用中验证了该近似乘法器的有效性。  相似文献   

11.
Cho  K.-J. Chung  J.-G. 《Electronics letters》2007,43(25):1414-1416
The partial product matrix of a parallel squarer is symmetric. To reduce the depth of the partial product matrix, it can be typically folded, shifted and merged. A high performance parallel squarer design technique using pre-calculated sums of some partial products is presented. It is shown that the proposed method reduces the area, propagation delay and power consumption compared with previous squarers.  相似文献   

12.
《Microelectronics Journal》2007,38(4-5):482-488
This paper presents the design of high performance and low power arithmetic circuits using a new CMOS dynamic logic family, and analyzes its sensitivity against technology parameters for practical applications. The proposed dynamic logic family allows for a partial evaluation in a computational block before its input signals are valid, and quickly performs a final evaluation as soon as the inputs arrive. The proposed dynamic logic family is well suited to arithmetic circuits where the critical path is made of a large cascade of inverting gates. Furthermore, circuits based on the proposed concept perform better in high fanout and high switching frequencies due to both lower delay and dynamic power consumption. Experimental results, for practical circuits, demonstrate that low power feature of the propose dynamic logic provides for smaller propagation time delay (3.5 times), lower energy consumption (55%), and similar combined delay, power consumption and active area product (only 8% higher), while exhibiting lower sensitivity to power supply, temperature, capacitive load and process variations than the dynamic domino CMOS technologies.  相似文献   

13.
三种改进结构型BiCMOS逻辑单元的研究   总被引:6,自引:2,他引:6  
为满足低压、高速、低耗数字系统的应用需求 ,通过采用改进电路结构和优化器件参数的方法 ,设计了三种改进结构型BiCMOS逻辑单元电路。实验结果表明 ,所设计电路不但具有确定的逻辑功能 ,而且获得了高速、低压、低耗和接近于全摆幅的特性 ,它们的工作速度比高速CMOS和原有的互补对称BiCMOS(CBiCMOS)电路快约一倍 ,功耗在 6 0MHz频率下仅高出 1 4 9~ 1 71mW ,但延迟 功耗积却比原CBiCMOS电路平均降低了4 0 3%。  相似文献   

14.
The conventional modified Booth encoding (MBE) generates an irregular partial product array because of the extra partial product bit at the least significant bit position of each partial product row. In this brief, a simple approach is proposed to generate a regular partial product array with fewer partial product rows and negligible overhead, thereby lowering the complexity of partial product reduction and reducing the area, delay, and power of MBE multipliers. The proposed approach can also be utilized to regularize the partial product array of posttruncated MBE multipliers. Implementation results demonstrate that the proposed MBE multipliers with a regular partial product array really achieve significant improvement in area, delay, and power consumption when compared with conventional MBE multipliers.   相似文献   

15.
This work presents low-power 2's complement multipliers by minimizing the switching activities of partial products using the radix-4 Booth algorithm. Before computation for two input data, the one with a smaller effective dynamic range is processed to generate Booth codes, thereby increasing the probability that the partial products become zero. By employing the dynamic-range determination unit to control input data paths, the multiplier with a column-based adder tree of compressors or counters is designed. To further reduce power consumption, the two multipliers based on row-based and hybrid-based adder trees are realized with operations on effective dynamic ranges of input data. Functional blocks of these two multipliers can preserve their previous input states for noneffective dynamic data ranges and thus, reduce the number of their switching operations. To illustrate the proposed multipliers exhibiting low-power dissipation, the theoretical analyzes of switching activities of partial products are derived. The proposed 16 /spl times/ 16-bit multiplier with the column-based adder tree conserves more than 31.2%, 19.1%, and 33.0% of power consumed by the conventional multiplier, in applications of the ADPCM audio, G.723.1 speech, and wavelet-based image coders, respectively. Furthermore, the proposed multipliers with row-based, hybrid-based adder trees reduce power consumption by over 35.3%, 25.3% and 39.6%, and 33.4%, 24.9% and 36.9%, respectively. When considering product factors of hardware areas, critical delays and power consumption, the proposed multipliers can outperform the conventional multipliers. Consequently, the multipliers proposed herein can be broadly used in various media processing to yield low-power consumption at limited hardware cost or little slowing of speed.  相似文献   

16.
Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low‐power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low‐power commutators based on an advanced interconnection, and parallel‐pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel‐pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures.  相似文献   

17.
Parallel multiplier is one of the most important building blocks in all the DSP processors, which needs faster computations. To reduce the total transistor count in a multiplier we have proposed two new approaches. The first approach is using a 26 transistor booth encoder and a 8-transistor/partial-product booth selector to generate partial products. The second approach proposes a new circuit for 4 : 2 compressors. The booth encoder and booth selector reported here are the smallest in transistor count, but comparable to the best delay with less power consumption. This paper describes a comparison of a compact 16 × 16 parallel multiplier using the new circuit components. This shows a transistor count advantage of 27% and 52% in partial product generation and partial product accumulation, respectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号