首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
李蓉  于伦正 《微机发展》2007,17(3):109-111
硬件设计中发展了许多除法运算算法,各算法在商收敛性速度、基本硬件单元和数学公式等许多方面均不相同。通过对现在较流行的浮点除法和平方根运算算法进行介绍,分析各浮点除法和平方根运算算法的思路和适合的不同场合,比较各自的优缺点。举例说明LSFT32处理器中浮点除法算法的选择。只有当算法的思路及其特点与运算器的结构相匹配时才能充分发挥速度和规模的优势,所选用的算法才是有意义的。  相似文献   

2.
本文根据数选式矩阵运算特点,结合低阶矩阵运算IP核,采用将IP核嵌入到数选矩阵中,同时添加浮点加法运算的方法,实现浮点矩阵相乘.在节省资源消耗的同时提升了系统性能,并将改进的浮点矩阵运算在FPGA中实现.仿真结果表明该设计可行,具有一定的实际意义和应用前景.  相似文献   

3.
硬件设计中发展了许多除法运算算法,各算法在商收敛性速度、基本硬件单元和数学公式等许多方面均不相同。通过对现在较流行的浮点除法和平方根运算算法进行介绍,分析各浮点除法和平方根运算算法的思路和适合的不同场合,比较各自的优缺点。举例说明LSFT32处理器中浮点除法算法的选择。只有当算法的思路及其特点与运算器的结构相匹配时才能充分发挥速度和规模的优势,所选用的算法才是有意义的。  相似文献   

4.
针对传统浮点融合乘加器会增加独立浮点加减法、乘法等运算延迟的缺点,首先设计并实现了一种分离通路浮点乘加器SPFMA,通过分离乘法和加法通路,在保持融合乘加运算延迟6拍延迟不变的情况下,将独立乘法和加法等运算延迟由6拍减为4拍,克服了传统融合乘加器的缺点。然后经专用工艺单元库逻辑综合评估,SPFMA可工作在1.2GHz以上,面积60779.44um2。最后在硬件仿真加速器平台上运行SPEC CPU2000浮点测试课题对其进行性能评估,结果表明所有浮点课题性能均有所提高,最大提高5.25%,平均提高1.61%,证明SPFMA可进一步提高浮点性能。  相似文献   

5.
一种高效结构的多输入浮点加法器在FPGA上的实现   总被引:3,自引:1,他引:3  
传统的多输入浮点加法运算是通过级联二输入浮点加法器来实现的,这种结构不可避免地使运算时延和所需逻辑资源成倍增加,从而越来越难以满足需要进行高速数字信号处理的需求。本文提出了一种适合在FPGA上实现的浮点数据格式和可以在四级流水线内完成的一种高效多输入浮点加法器结构,并给出了在Xilinx公司Virtex系列芯片上的测试
试数据。  相似文献   

6.
随着面向数字信号处理以及其他相关领域的专用微处理技术的发展,浮点乘加运算变得日益重要。该操作将乘法和加法相融合,节省了整个运算的执行延时。基于多通路的思想,本文提出一种改进的多通道浮点乘加器结构。根据对阶时A相对于B×C乘积的位置,将整个处理过程分为四条数据通路,采用不同的数据处理通路,避免了不必要的处理延时。通过对比得出:多通道浮点乘加器无论在速度以及功耗上,都具有一定的优势。  相似文献   

7.
以IEEE754标准格式中的单精度格式为标准,进行浮点加法器的设计。SystemC作为一种基于C 语言的新型硬件设计语言比较原有的HDL语言在系统级建模、软硬件协调设计方面更具优势,因此也更适用于SoC的设计建模。通过对浮点加法流程的分析,以其算法设计和结构映射为例,对浮点加法步骤加以讨论,得出合适于标准格式的设计,并结合如何应用SystemC进行系统设计,给出浮点加法器部分模块的SystemC描述。  相似文献   

8.
基于Microblaze处理器的浮点内积运算设计   总被引:1,自引:0,他引:1  
浮点内积运算在信号处理与图像处理中有着广泛的应用,本文利用软核处理器灵活性和可扩展性的特点,介绍了基于Microblaze处理器的浮点内积运算结构,设计采用IEEE-754双精度浮点数,通过对DSA电路改进设计出了适合于内积运算的累加电路结构。通过EDK设计平台,在SOPC系统中把内积运算单元通过FSL总线挂载到Microblaze软核处理器上,实现了硬件单元的调用。  相似文献   

9.
浮点加法运算是浮点运算中使用频率最高的一种运算.本文采用了五级加法器流水线结构,并使用Verilog HDL硬件描述语言对其进行编码.利在使用SMIC 0.18um CMOS工艺库进行综合,工作频率能达到500MHz.  相似文献   

10.
在椭圆曲线密码体制中,有限域的乘法运算是最关键的运算。基于Ⅱ型正规基域的加法运算速度快、乘方运算简单,但乘法运算比较复杂,成为该域上运算的瓶颈。为了解决这个问题,该文在分析串行乘法算法的基础上对算法进行改进,该算法与串行乘法算法相比,减少了运算周期,有效地提高了运行速度,根据改进算法设计并行乘法器结构,并在FPGA上进行实现,为进一步提高椭圆曲线加密速度提供硬件基础。  相似文献   

11.
Floating-point fast Fourier transform (FFT) has been widely expected in scientific computing and high-resolution imaging applications due to the wide dynamic range and high processing precision. However, it suffers high area and energy overhead problems in comparison to fixed-point implementations. To address these issues, this paper presents an area- and energy-efficient hybrid architecture for floating-point FFT computations. It minimizes the required arithmetic units and reduces the memory usage significantly by combining three different parts. The serial radix-4 butterfly (SR4BF) is used in the single-path delay commutator (SDC) part to minimize the required arithmetic units with 100% adder utilization ratio obtained. A modified single-path delay feedback (MSDF) architecture is proposed to achieve a tradeoff between arithmetic resources and memory usage by using the new half radix-4 butterfly (HR4BF) with 50% adder utilization ratio obtained. The intermediate caching buffer is modified accordingly in the MSDF part. By combining both the advantages on arithmetic units reducing and memory usage optimization in different parts, the optimized area and power are obtained without throughput loss. The logic synthesis results in a 65 nm CMOS technology show that the energy per FFT is about 331.5 nJ for 1024-point FFT computations at 400 MHz. The total hardware overhead is equivalent to 460k NAND2 gates.  相似文献   

12.
本文介绍了浮点加法器(FPA)的基本运算步骤,归纳阐述了传统的多输入浮点加法器算法,提出了一种改进的并行多输入浮点加法器算法。采用这种改进的算法可以有效地提高运算速度并减少逻辑资源。  相似文献   

13.
高精度、高性能浮点运算部件是高性能微处理器设计的重要部分。通过对传统双精度浮点乘加运算算法的研究,结合四倍精度浮点数据格式特点,设计并实现一种高性能的四倍精度浮点乘加器(QPFMA),该乘加器支持多种浮点运算,运算延迟为7拍,全流水结构。采用双路加法器改进算法结构,优化头零预测和规格化移位逻辑,减小运算延迟和硬件开销。通过参数化设计验证方法,实现高效的正确性验证。逻辑综合结果表明,基于65 nm工艺,该QPFMA频率可达1.2 GHz,比现有的QPFMA设计运算延迟减少3拍,频率提高约11.63%。  相似文献   

14.
Ahmet   《Journal of Systems Architecture》2008,54(12):1129-1142
Most modern microprocessors provide multiple identical functional units to increase performance. This paper presents dual-mode floating-point adder architectures that support one higher precision addition and two parallel lower precision additions. A double precision floating-point adder implemented with the improved single-path algorithm is modified to design a dual-mode double precision floating-point adder that supports both one double precision addition and two parallel single precision additions. A similar technique is used to design a dual-mode quadruple precision floating-point adder that implements the two-path algorithm. The dual-mode quadruple precision floating-point adder supports one quadruple precision and two parallel double precision additions. To estimate area and worst-case delay, double, quadruple, dual-mode double, and dual-mode quadruple precision floating-point adders are implemented in VHDL using the improved single-path and the two-path floating-point addition algorithms. The correctness of all the designs is tested and verified through extensive simulation. Synthesis results show that dual-mode double and dual-mode quadruple precision adders designed with the improved single-path algorithm require roughly 26% more area and 10% more delay than double and quadruple precision adders designed with the same algorithm. Synthesis results obtained for adders designed with the two-path algorithm show that dual-mode double and dual-mode quadruple precision adders requires 33% and 35% more area and 13% and 18% more delay than double and quadruple precision adders, respectively.  相似文献   

15.
标志前缀加法器的结构优化设计   总被引:1,自引:1,他引:0       下载免费PDF全文
许团辉  王玉艳  章建雄 《计算机工程》2010,36(13):286-287,290
标志前缀加法器运算速度快但存在面积大的缺点。为满足实际应用中对浮点乘加单元面积的要求,对其进行结构优化得到基于Kogge-stone树结构的51位标志前缀加法器,采用模块级联减少运算单元个数,达到减小浮点乘加单元面积、降低功耗的目的。在TMSC 0.18 μm工艺下,该51位加法器的面积、总功耗、关键路径时延分别减少了10%, 10.5%, 6.4%。  相似文献   

16.
The adders are the vital arithmetic operation for any arithmetic operations like multiplication, subtraction, and division. Binary number additions are performed by the digital circuit known as the adder. In VLSI (Very Large Scale Integration), the full adder is a basic component as it plays a major role in designing the integrated circuits applications. To minimize the power, various adder designs are implemented and each implemented designs undergo defined drawbacks. The designed adder requires high power when the driving capability is perfect and requires low power when the delay occurred is more. To overcome such issues and to obtain better performance, a novel parallel adder is proposed. The design of adder is initiated with 1 bit and has been extended up to 32 bits so as verify its scalability. This proposed novel parallel adder is attained from the carry look-ahead adder. The merits of this suggested adder are better speed, power consumption and delay, and the capability in driving. Thus designed adders are verified for different supply, delay, power, leakage and its performance is found to be superior to competitive Manchester Carry Chain Adder (MCCA), Carry Look Ahead Adder (CLAA), Carry Select Adder (CSLA), Carry Select Adder (CSA) and other adders.  相似文献   

17.
提出一种应用于可配置椭圆曲线密码体制的有限域多项式算术模块结构,乘法器基于已有的digit-serial结构乘法器,利用局部并行的bit-parallel结构,省去了模约简电路,使乘法器可适用于任意不可约多项式。平方器结构利用LSB或LSD乘法器以及加法器来计算模平方,通过数据接口控制输入数据的格式,可以满足不同域值有限域点乘运算的需求。  相似文献   

18.
浮点加法运算器前导1预判电路的实现   总被引:2,自引:0,他引:2  
提出了一种应用于浮点加法器设计中前导1预判电路(LOP)的实现方案。此方案的提出是针对进行浮点加减运算时,尾数相减的结果可能会产生若干个头零,对于前导1的判断将直接影响规格化左移的位数而提出的。前导1的预判与尾数的减法运算并行执行,而不是对减法结果的判断,同时,并行检测预判中可能产生的1位误差,有效缩短了整个加法器的延时。LOP电路设计采用VHDL语言门级描述,已通过逻辑仿真验证,并在浮点加法器的设计中得到应用。  相似文献   

19.
Hardware Support for Interval Arithmetic   总被引:1,自引:0,他引:1  
A hardware unit for interval arithmetic (including division by an interval that contains zero) is described in this paper. After a brief introduction an instruction set for interval arithmetic is defined which is attractive from the mathematical point of view. These instructions consist of the basic arithmetic operations and comparisons for intervals including the relevant lattice operations. To enable high speed, the case selections for interval multiplication (9 cases) and interval division (14 cases) are done in hardware. The lower bound of the result is computed with rounding downwards and the upper bound with rounding upwards by parallel units simultaneously. The rounding mode must be an integral part of the arithmetic operation. Also the basic comparisons for intervals together with the corresponding lattice operations and the result selection in more complicated cases of multiplication and division are done in hardware. There they are executed by parallel units simultaneously. The circuits described in this paper show that with modest additional hardware costs interval arithmetic can be made almost as fast as simple floating-point arithmetic.  相似文献   

20.
张佳康  陈庆奎 《计算机工程》2010,36(15):179-181
针对具有高浮点运算能力的流处理器设备GPU对神经网络的适用性问题,提出卷积神经网络的并行化识别算法,采用计算统一设备架构(CUDA)技术,并定义其上的并行化数据结构,描述计算任务到CUDA的映射机制。实验结果证明,在GTX200硬件架构的GPU上实现的并行识别算法的平均浮点运算能力峰值较CPU上串行算法提高了近60倍,更适用于神经网络的相关应用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号