共查询到20条相似文献,搜索用时 287 毫秒
1.
2.
在余数系统中(2^n-1)是最普遍应用的模,提出了一种新的booth编码结构,并基于提出的booth编码结构,提出了一种高速模(2^n-1)乘法器.该乘法器采用CSA或者wallace Tree结构可以进一步提高运算速度.此乘法器在一个时钟周期内可以完成所需运算,简单高效. 相似文献
3.
4.
“ENOD”是某公司2006年设计的一款32位嵌入式RISC微处理器。其中的硬件乘法器位于设计的关键时序路径上,为优化乘法器的时序和提高其灵活性,采用Radix4-Booth算法,设计了单周期、流水线和多周期3种乘法器结构,在Modelsim中进行了功能仿真和时序仿真。采用中芯国际0.18μm的标准单元库将它们分别在DC中综合后,从功能、面积、速度等方面对这3种乘法器结构做了定量分析,指出了它们各自的优缺点及应用场合。在“ENOD”的应用中,根据具体的应用通过设置参数选择最合适的乘法器结构,灵活性好,性能/面积比高。 相似文献
5.
6.
7.
针对软件实现浮点运算的速度无法满足RISC-V嵌入式处理器浮点运算的需求,设计了一种由浮点加法器和浮点乘法器构成的浮点单元(FPU),其中浮点乘法器提出了新型的Wallace树压缩结构,提高了压缩速率。在“蜂鸟E203”处理器中,完成浮点指令的译码模块与派遣模块的设计,实现FPU模块的移植。基于Simc180 nm工艺,使用Sysnopsys公司的Design Compile、VCS工具对FPU进行功能验证和综合,仿真结果表明,浮点加法器的关键路径延时为10.17 ns,相比于串行浮点加法器延时缩短23%,浮点乘法器的压缩结构关键路径延时为0.27 ns,相比传统Wallace树压缩延时缩短10%,移植前后的FPU运算结果一致。 相似文献
8.
在数字信号处理中经常需要进行乘法运算,乘法器的设计对整个器件的性能有很大的影响,在此介绍20×18比特定点阵列乘法器的设计。采用基4-Booth算法和4—2压缩的方案,并采用先进的集成电路工艺,使用SMIC0.18μm标准单元库,提高了乘法器的速度,节省了器件。利用Xilinx FPGA(xc2vp70-6ff1517)对乘法器进行了综合仿真,完成一次乘法运算的时间为15.922ns,在减少乘法器器件的同时,提高了乘法器的速度,降低了器件的功耗。 相似文献
9.
为了减少乘法指令在保留站中的等待时间,设计了一款32位流水线型乘法器,该乘法器将应用于作者设计的一款超标量处理器中.该乘法器应用了改进型的booth编码算法,对部分积生成电路进行了优化,并采用了4-2压缩器与3-2压缩器相结合的Wallace树型结构对部分积进行压缩,最后再根据各级的延迟,在电路中插入了流水线寄存器,使其运算速度得到了提高.该乘法器使用GSMC 0.18μm工艺进行综合.经过仿真验证,该乘法器大大减少了在保留站中等待执行的乘法指令的完成时间,使每个时钟周期都有一条新的乘法指令被发送至乘法器进行运算. 相似文献
10.
11.
12.
Benschneider B.J. Bowhill W.J. Copper E.M. Gavrielov M.N. Gronowski P.E. Maheshwari V.K. Peng V. Pickholtz J.D. Samudrala S. 《Solid-State Circuits, IEEE Journal of》1989,24(5):1317-1323
A 135K transistor, uniformly pipelined 50-MHz CMOS 64-bit floating-point arithmetic processor chip is described. The execution unit is capable of sustaining pipelined performance of one 32-bit or 64-bit result every 20 ns for all operations except double-precision multiply (40 ns) and divide. The chip employs an exponent difference prediction scheme and a unified leading-one and sticky-bit computation logic for the addition and subtraction operations. A hardware multiplier using a radix-8 modified Booth algorithm and a divider using a radix-2 SRT algorithm are employed.<> 相似文献
13.
运用流水线技术对单精度浮点乘法和加法运算单元进行了优化设计。浮点加法器采用了改进的双路径结构,重点对移位单元和前导1检测单元的结构进行了优化。浮点乘法器在对被乘数进行Booth编码后,采用改进的4-2压缩器构成Wallace树,在简化逻辑的同时,提高了系统的吞吐率。经过仿真验证,在Virtex-4系列FPGA(现场可编程门阵列)上,浮点加法器的最高运行速率达到405MHz,浮点乘法器的最高运行速率达到429MHz。 相似文献
14.
Vojin G. Oklobdzija David Villeger Thierry Soulas 《The Journal of VLSI Signal Processing》1994,7(3):213-222
In this article we consider a design of a multiplier for the multiplication of complex numbers. The complex numbers are packed
into one 32-bit word. They are represented by two 13-bit parts with the same 6-bit exponent. Multiplication of complex numbers
is examined from the perspectives of performance, complexity and silicon area. The design is unique and combines shared Booth
encoding for the real and imaginary parts including only one combined modified Wallace tree of 4:2 adders for each part. The
regular Wallace tree is compared with the tree of 4:2 adders. This design results in a more compact wiring structure and balanced
delays resulting in a faster multiplier circuit. The number of adders used in the multiplier is also reduced. We consider
VLSI CMOS technology and the relevant characteristics as they impact the implementation and performance. 相似文献
15.
《Solid-State Circuits, IEEE Journal of》1985,20(5):998-1004
A general-purpose programmable digital signal processor (DSP) has been implemented in 1.5-/spl mu/m (L/SUB eff/) NMOS technology using full-custom circuit design for high performance. The DSP has a 32-bit instruction set, 32-bit data path, and full-hardware 32-bit floating-point arithmetic. The architecture is described section by section, and an overview of the instruction set is presented. The extensive design verification process applied to the DSP is also described. 相似文献
16.
17.
Hwa-Joon Oh Mueller S.M. Jacobi C. Tran K.D. Cottier S.R. Michael B.W. Nishikawa H. Totsuka Y. Namatame T. Yano N. Machida T. Dhong S.H. 《Solid-State Circuits, IEEE Journal of》2006,41(4):759-771
The floating-point unit (FPU) in the synergistic processor element (SPE) of a CELL processor is a fully pipelined 4-way single-instruction multiple-data (SIMD) unit designed to accelerate media and data streaming with 128-bit operands. It supports 32-bit single-precision floating-point and 16-bit integer operands with two different latencies, six-cycle and seven-cycle, with 11 FO4 delay per stage. The FPU optimizes the performance of critical single-precision multiply-add operations. Since exact rounding, exceptions, and de-norm number handling are not important to multimedia applications, IEEE correctness on the single-precision floating-point numbers is sacrificed for performance and simple design. It employs fine-grained clock gating for power saving. The design has 768K transistors in 1.3 mm/sup 2/, fabricated SOI in 90-nm technology. Correct operations have been observed up to 5.6 GHz with 1.4 V and 56/spl deg/C, delivering 44.8 GFlops. Architecture, logic, circuits, and integration are codesigned to meet the performance, power, and area goals. 相似文献
18.
《Solid-State Circuits, IEEE Journal of》1987,22(1):28-34
A 16-bit /spl times/ 16-bit multiplier for 2 two's-complement binary numbers based on a new algorithm is described. This multiplier has been fabricated on an LSI chip using a standard n-E/D MOS process technology with a 2.7-/spl mu/m design rule. This multiplier is characterized by use of a binary tree of redundant binary adders. In the new algorithm, n-bit multiplication is performed in a time proportional to log/SUB 2/ n and the physical design of the multiplier is constructed of a regular cellular array. This new algorithm has been proposed by N. Takagi et al. (1982, 1983). The 16-bit/spl times/16-bit multiplier chip size is 5.8 /spl times/ 6.3 mm/SUP 2/ using the new layout for a binary adder tree. The chip contains about 10600 transistors, and the longest logic path includes 46 gates. The multiplication time was measured as 120 ns. It is estimated that a 32-bit /spl times/ 32-bit multiplication time is about 140 ns. 相似文献
19.
通过分析FPGA可配置逻辑块的细致结构,提出了一种基于FPGA的细粒度映射方法,并使用该方法高效实现了大数模乘脉动阵列.在保持高速计算特点的同时,将模乘脉动阵列的资源消耗降低为原来的三分之一.在低成本的20万门级FPGA器件中即可实现1024位模乘器.该实现每秒可进行20次RSA签名.如果换用高性能FPGA,签名速度更可提高至每秒40次. 相似文献
20.
基于FPGA的32位浮点FFT处理器的设计 总被引:5,自引:3,他引:5
介绍了一种基于FPGA的1024点32位浮点FFT处理器的设计。采用改进的蝶形运算单元,减小了系统的硬件消耗,改善了系统的性能。详细讨论了32位浮点加法器/减法器、乘法器的分级流水技术,提高了系统性能。浮点算法的采用使得系统具有较高的处理精度。 相似文献