共查询到20条相似文献,搜索用时 187 毫秒
1.
硬件设计中发展了许多除法运算算法,各算法在商收敛性速度、基本硬件单元和数学公式等许多方面均不相同。通过对现在较流行的浮点除法和平方根运算算法进行介绍,分析各浮点除法和平方根运算算法的思路和适合的不同场合,比较各自的优缺点。举例说明LSFT32处理器中浮点除法算法的选择。只有当算法的思路及其特点与运算器的结构相匹配时才能充分发挥速度和规模的优势,所选用的算法才是有意义的。 相似文献
2.
3.
硬件设计中发展了许多除法运算算法,各算法在商收敛性速度、基本硬件单元和数学公式等许多方面均不相同。通过对现在较流行的浮点除法和平方根运算算法进行介绍,分析各浮点除法和平方根运算算法的思路和适合的不同场合,比较各自的优缺点。举例说明LSFT32处理器中浮点除法算法的选择。只有当算法的思路及其特点与运算器的结构相匹配时才能充分发挥速度和规模的优势,所选用的算法才是有意义的。 相似文献
4.
针对传统浮点融合乘加器会增加独立浮点加减法、乘法等运算延迟的缺点,首先设计并实现了一种分离通路浮点乘加器SPFMA,通过分离乘法和加法通路,在保持融合乘加运算延迟6拍延迟不变的情况下,将独立乘法和加法等运算延迟由6拍减为4拍,克服了传统融合乘加器的缺点。然后经专用工艺单元库逻辑综合评估,SPFMA可工作在1.2GHz以上,面积60779.44um2。最后在硬件仿真加速器平台上运行SPEC CPU2000浮点测试课题对其进行性能评估,结果表明所有浮点课题性能均有所提高,最大提高5.25%,平均提高1.61%,证明SPFMA可进一步提高浮点性能。 相似文献
5.
一种高效结构的多输入浮点加法器在FPGA上的实现 总被引:3,自引:1,他引:3
传统的多输入浮点加法运算是通过级联二输入浮点加法器来实现的,这种结构不可避免地使运算时延和所需逻辑资源成倍增加,从而越来越难以满足需要进行高速数字信号处理的需求。本文提出了一种适合在FPGA上实现的浮点数据格式和可以在四级流水线内完成的一种高效多输入浮点加法器结构,并给出了在Xilinx公司Virtex系列芯片上的测试
试数据。 相似文献
试数据。 相似文献
6.
随着面向数字信号处理以及其他相关领域的专用微处理技术的发展,浮点乘加运算变得日益重要。该操作将乘法和加法相融合,节省了整个运算的执行延时。基于多通路的思想,本文提出一种改进的多通道浮点乘加器结构。根据对阶时A相对于B×C乘积的位置,将整个处理过程分为四条数据通路,采用不同的数据处理通路,避免了不必要的处理延时。通过对比得出:多通道浮点乘加器无论在速度以及功耗上,都具有一定的优势。 相似文献
7.
8.
基于Microblaze处理器的浮点内积运算设计 总被引:1,自引:0,他引:1
浮点内积运算在信号处理与图像处理中有着广泛的应用,本文利用软核处理器灵活性和可扩展性的特点,介绍了基于Microblaze处理器的浮点内积运算结构,设计采用IEEE-754双精度浮点数,通过对DSA电路改进设计出了适合于内积运算的累加电路结构。通过EDK设计平台,在SOPC系统中把内积运算单元通过FSL总线挂载到Microblaze软核处理器上,实现了硬件单元的调用。 相似文献
9.
10.
11.
Floating-point fast Fourier transform (FFT) has been widely expected in scientific computing and high-resolution imaging applications due to the wide dynamic range and high processing precision. However, it suffers high area and energy overhead problems in comparison to fixed-point implementations. To address these issues, this paper presents an area- and energy-efficient hybrid architecture for floating-point FFT computations. It minimizes the required arithmetic units and reduces the memory usage significantly by combining three different parts. The serial radix-4 butterfly (SR4BF) is used in the single-path delay commutator (SDC) part to minimize the required arithmetic units with 100% adder utilization ratio obtained. A modified single-path delay feedback (MSDF) architecture is proposed to achieve a tradeoff between arithmetic resources and memory usage by using the new half radix-4 butterfly (HR4BF) with 50% adder utilization ratio obtained. The intermediate caching buffer is modified accordingly in the MSDF part. By combining both the advantages on arithmetic units reducing and memory usage optimization in different parts, the optimized area and power are obtained without throughput loss. The logic synthesis results in a 65 nm CMOS technology show that the energy per FFT is about 331.5 nJ for 1024-point FFT computations at 400 MHz. The total hardware overhead is equivalent to 460k NAND2 gates. 相似文献
12.
本文介绍了浮点加法器(FPA)的基本运算步骤,归纳阐述了传统的多输入浮点加法器算法,提出了一种改进的并行多输入浮点加法器算法。采用这种改进的算法可以有效地提高运算速度并减少逻辑资源。 相似文献
13.
高精度、高性能浮点运算部件是高性能微处理器设计的重要部分。通过对传统双精度浮点乘加运算算法的研究,结合四倍精度浮点数据格式特点,设计并实现一种高性能的四倍精度浮点乘加器(QPFMA),该乘加器支持多种浮点运算,运算延迟为7拍,全流水结构。采用双路加法器改进算法结构,优化头零预测和规格化移位逻辑,减小运算延迟和硬件开销。通过参数化设计验证方法,实现高效的正确性验证。逻辑综合结果表明,基于65 nm工艺,该QPFMA频率可达1.2 GHz,比现有的QPFMA设计运算延迟减少3拍,频率提高约11.63%。 相似文献
14.
Most modern microprocessors provide multiple identical functional units to increase performance. This paper presents dual-mode floating-point adder architectures that support one higher precision addition and two parallel lower precision additions. A double precision floating-point adder implemented with the improved single-path algorithm is modified to design a dual-mode double precision floating-point adder that supports both one double precision addition and two parallel single precision additions. A similar technique is used to design a dual-mode quadruple precision floating-point adder that implements the two-path algorithm. The dual-mode quadruple precision floating-point adder supports one quadruple precision and two parallel double precision additions. To estimate area and worst-case delay, double, quadruple, dual-mode double, and dual-mode quadruple precision floating-point adders are implemented in VHDL using the improved single-path and the two-path floating-point addition algorithms. The correctness of all the designs is tested and verified through extensive simulation. Synthesis results show that dual-mode double and dual-mode quadruple precision adders designed with the improved single-path algorithm require roughly 26% more area and 10% more delay than double and quadruple precision adders designed with the same algorithm. Synthesis results obtained for adders designed with the two-path algorithm show that dual-mode double and dual-mode quadruple precision adders requires 33% and 35% more area and 13% and 18% more delay than double and quadruple precision adders, respectively. 相似文献
15.
16.
The adders are the vital arithmetic operation for any arithmetic operations like multiplication, subtraction, and division. Binary number additions are performed by the digital circuit known as the adder. In VLSI (Very Large Scale Integration), the full adder is a basic component as it plays a major role in designing the integrated circuits applications. To minimize the power, various adder designs are implemented and each implemented designs undergo defined drawbacks. The designed adder requires high power when the driving capability is perfect and requires low power when the delay occurred is more. To overcome such issues and to obtain better performance, a novel parallel adder is proposed. The design of adder is initiated with 1 bit and has been extended up to 32 bits so as verify its scalability. This proposed novel parallel adder is attained from the carry look-ahead adder. The merits of this suggested adder are better speed, power consumption and delay, and the capability in driving. Thus designed adders are verified for different supply, delay, power, leakage and its performance is found to be superior to competitive Manchester Carry Chain Adder (MCCA), Carry Look Ahead Adder (CLAA), Carry Select Adder (CSLA), Carry Select Adder (CSA) and other adders. 相似文献
17.
18.
浮点加法运算器前导1预判电路的实现 总被引:2,自引:0,他引:2
提出了一种应用于浮点加法器设计中前导1预判电路(LOP)的实现方案。此方案的提出是针对进行浮点加减运算时,尾数相减的结果可能会产生若干个头零,对于前导1的判断将直接影响规格化左移的位数而提出的。前导1的预判与尾数的减法运算并行执行,而不是对减法结果的判断,同时,并行检测预判中可能产生的1位误差,有效缩短了整个加法器的延时。LOP电路设计采用VHDL语言门级描述,已通过逻辑仿真验证,并在浮点加法器的设计中得到应用。 相似文献
19.
Hardware Support for Interval Arithmetic 总被引:1,自引:0,他引:1
A hardware unit for interval arithmetic (including division by an interval that contains zero) is described in this paper.
After a brief introduction an instruction set for interval arithmetic is defined which is attractive from the mathematical
point of view. These instructions consist of the basic arithmetic operations and comparisons for intervals including the relevant
lattice operations. To enable high speed, the case selections for interval multiplication (9 cases) and interval division
(14 cases) are done in hardware. The lower bound of the result is computed with rounding downwards and the upper bound with
rounding upwards by parallel units simultaneously. The rounding mode must be an integral part of the arithmetic operation.
Also the basic comparisons for intervals together with the corresponding lattice operations and the result selection in more
complicated cases of multiplication and division are done in hardware. There they are executed by parallel units simultaneously.
The circuits described in this paper show that with modest additional hardware costs interval arithmetic can be made almost
as fast as simple floating-point arithmetic. 相似文献