期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李蓉于伦正《计算机技术与发展》2007,17(3):109-112

硬件设计中发展了许多除法运算算法，各算法在商收敛性速度、基本硬件单元和数学公式等许多方面均不相同。通过对现在较流行的浮点除法和平方根运算算法进行介绍，分析各浮点除法和平方根运算算法的思路和适合的不同场合，比较各自的优缺点。举例说明LSFT32处理器中浮点除法算法的选择。只有当算法的思路及其特点与运算器的结构相匹配时才能充分发挥速度和规模的优势，所选用的算法才是有意义的。相似文献

2.

基于VHDL的浮点算法研究

夏阳邹莹《计算机仿真》2007,24(4):87-90

浮点运算是数字信号处理中最基本的运算,但因为现行EDA软件没有提供浮点运算功能,使其在FPGA中的实现却是个棘手问题.文中提出了一种基于VHDL的高精度浮点算法,并以9位实序列为例,通过浮点数表示、对阶操作、尾数运算以及规格化处理等步骤高效并准确地实现浮点加/减法、乘法、除法以及平方根等运算,最后在FPGA中下载并实现了上述浮点运算,并给出测试结果.测试数据表明:所设计的浮点算法在其浮点数位宽所对应的精度范围内,可以在FPGA上成功地实现包含加、减、乘、除及求平方根等各种浮点运算. 相似文献

3.

定点DSP中高精度除法的实现方法 总被引：1，自引：0，他引：1

刘洪鸣邱建辉邱奕文《单片机与嵌入式系统应用》2009,(1):69-71

各种集成化单片数字信号处理器（DSP）以其功能强、集成度高、应用灵活、性价比高等优点,在信号处理和系统控制中的主导性地位日益明显。许多信号处理和控制需要运用除法运算。一般的数字信号处理器中没有现成的除法指令。十多年前诞生的浮点DSP,由于其用硬件完成浮点数的运算,在数据处理和运算能力上大大超出定点DSP,处理除法运算也比定点DSP更为简单。相似文献

4.

基于Goldschmidt算法的高性能双精度浮点除法器设计

何婷婷彭元喜雷元武《计算机应用》2015,35(7):1854-1857

针对双精度浮点除法通常运算过程复杂、延时较大这一问题,提出一种基于Goldschmidt算法设计支持IEEE-754标准的高性能双精度浮点除法器方法。首先,分析Goldschmidt算法运算除法的过程以及迭代运算产生的误差;然后,提出了控制误差的方法;其次,采用了较节约面积的双查找表法确定迭代初值,迭代单元采用并行乘法器结构以提高迭代速度;最后,合理划分流水站,控制迭代过程使浮点除法可以流水执行,从而进一步提高除法器运算速率。实验结果表明,在40 nm工艺下,双精度浮点除法器采用14位迭代初值流水结构,其综合cell面积为84902.2618 μm²,运行频率可达2.2 GHz;相比采用8位迭代初值流水结构运算速度提高了32.73%,面积增加了5.05%;计算一条双精度浮点除法的延迟为12个时钟周期,流水执行时,单条除法平均延迟为3个时钟周期,与其他处理器中基于SRT算法实现的双精度浮点除法器相比,数据吞吐率提高了3~7倍;与其他处理器中基于Goldschmidt算法实现的双精度浮点除法器相比,数据吞吐率提高了2~3倍。相似文献

5.

基于快速平方根运算的偏移组合平方运算方法

陈宇王遵立金圣经毕淑艳张建涛《数据采集与处理》2000,15(1):69-73

由快速求平方根算法入手 ,针对其运算的过程 ,从另一个角度 ,即根据其特点提出一种新的平方运算方法 ,进一步加以改进并实现 ,并成功地应用于一些小型控制系统中。本算法使浮点多字节平方的运算的速度比单纯用乘法完成的平方运算大大提高。相似文献

6.

Cortex-M3内核浮点型运算的研究与实现

梅静静王申良《单片机与嵌入式系统应用》2011,11(1):40-41,45

通过分析Cortex-M3内核的结构与浮点型格式,充分利用Cortex-M3内核中的分支预测、单周期乘法、硬件除法等众多功能强大的特性,使用Thumb-2指令集实现了单精度浮点型的加、减、乘、除与比较运算,并给出了加减法运算的流程图和除法运算的源程序. 相似文献

7.

基于RISC-V浮点指令集FPU的研究与设计

下载免费PDF全文

潘树朋刘有耀焦继业李昭《计算机工程与应用》2021,57(3):80-86

针对目前浮点运算软件实现速度慢,不能满足嵌入式处理器实时性要求以及运算种类有限等问题,提出了一种基于RISC-V指令集的浮点处理器,能够执行加法、减法、乘法、除法、平方根、乘累加以及比较运算,完全符合IEEE 754-2008标准。在VCS仿真环境下对浮点处理器进行了功能验证,各模块均能满足正确性要求。将浮点处理器与一款开源处理器核蜂鸟E203集成,使用SMIC 0.18工艺库完成了逻辑综合,并在FPGA上对设计进行了测试。结果表明,该浮点处理器的逻辑门数仅为24 200,吞吐量为150 MFLOPS,与已公开文献的设计方案相比,硬件面积分别减少7%、1.5%。综合运行频率可达100 MHz。相似文献

8.

嵌入式协处理器中除法和平方根计算的整合设计 总被引：2，自引：0，他引：2

梁政沈绪榜《计算机研究与发展》2001,38(8):1016-1020

在浮点处理元中串行实现除法和平方根计算虽然速度慢,但设计简单规则,占用资源少,有利于嵌入式的应用。结合嵌入式协处理器LSC87的研制,给出了串行实现除法和平方根计算的基4SRT算法,介绍了确定SRT选择常数过程中不确定区域的验证方法;给出了除法与平方根计算可共用的基4SRT查询表设计;同时讨论了迭代冗余结果向非冗余二进制的转换。本协处理器设计量大限度地利用了通用数据路径来完成SRT算法的实现,节约了设计资源,并缩短了迭代时间。相似文献

9.

基于查表法的快速求浮点数平方根方法

陈龙永梁兴东丁赤飚《微计算机信息》2009,25(6)

在基于浮点DSP的实时运算中,求平方根算法占用了大量的运算时间,成为运算中的瓶颈之一.本文提出一种基于二进制浮点数结构和查表法结合的快速求浮点数平方根方法.理论分析了浮点数平方根和浮点数本身的关系,结合了二进制浮点数结构和查表方式,使求平方根方法只需要移位,加法和查表等简单计算步骤,具有精度高,速度快等优点.把该方法同C语言标准函数库比较,计算时间可降低70%. 相似文献

10.

基数4除法和平方根求取算法分析

陈兴业《计算机学报》1983,(4)

本文提出一种使用基数4的快速求取除法和平方根的算法,并给出详细的证明。基本出发点是减少运算的重复迭代次数,采用每次运算能得到二位结果的算法而不是常规的每次只能得到一位结果的二进制算法。与后者相比,这种算法所需迭代次数可减少一倍,因而运算速度可以得到较多的提高。这种算法由于综合了两种算法的共同特点,在硬件实现上将会得到一定的节省。相似文献

11.

Elements of the theory and technique of multiplication-division operations in pseudoneural systems

I. P. Kobyak 《Automatic Control and Computer Sciences》2008,42(1):10-19

We study the mechanism of performing multiplication and division operations determined as functions of simple multiple addition and subtraction. The hardware realization of the operation unit synthesized while taking into account the projection of the neural system structure onto the complex of logical and memory elements is developed. The formulated concept of performing arithmetic transformations is considered a novel technical idea concerning the nature of real neural calculations. The division and multiplication algorithms realized in parallel are represented in the form of formal transformations and can be used for the synthesis of hardware and software simulation of calculation algorithms. 相似文献

12.

Hardware Support for Interval Arithmetic 总被引：1，自引：0，他引：1

Reinhard Kirchner Ulrich W. Kulisch 《Reliable Computing》2006,12(3):225-237

A hardware unit for interval arithmetic (including division by an interval that contains zero) is described in this paper. After a brief introduction an instruction set for interval arithmetic is defined which is attractive from the mathematical point of view. These instructions consist of the basic arithmetic operations and comparisons for intervals including the relevant lattice operations. To enable high speed, the case selections for interval multiplication (9 cases) and interval division (14 cases) are done in hardware. The lower bound of the result is computed with rounding downwards and the upper bound with rounding upwards by parallel units simultaneously. The rounding mode must be an integral part of the arithmetic operation. Also the basic comparisons for intervals together with the corresponding lattice operations and the result selection in more complicated cases of multiplication and division are done in hardware. There they are executed by parallel units simultaneously. The circuits described in this paper show that with modest additional hardware costs interval arithmetic can be made almost as fast as simple floating-point arithmetic. 相似文献

13.

椭圆的双步生成算法 总被引：2，自引：0，他引：2

阎双唐棣《计算机工程与应用》2006,42(33):66-67

对已有的圆和椭圆生成算法进行深入研究后,提出了一种双点生成椭圆弧的算法。与同类算法相比,该算法判别式构造简单,具有递推性,且只有整数加减和移位运算。比较结果表明,该算法比现有其他算法具有更快的执行速度。文中算法便于硬件实现。相似文献

14.

Comparative implementations of the LMS algorithm

Michael Andrews 《Computers & Electrical Engineering》1986,12(3-4):119-135

A study of available hardware algorithms was made in order to design adaptive signal processors with VLSI. A suitable model invoking synchrony, topology, and granularity has been chosen to investigate design figures-of-merit for each implementation. At present, redundant arithmetic is being contrasted, basically because carry-free operations are possible resulting in a speed up. This paper focuses on models and primitive computational elements for the least-mean-square (LMS) algorithm embedded in conventional twos complement, bit-serial or distributed arithmetic, and redundant arithmetic processors. 相似文献

15.

A Run-Length Slice Line Drawing Algorithm without Division Operations 总被引：1，自引：0，他引：1

Khun Yee Fung Tina M. Nicholl A. K. Dewdney 《Computer Graphics Forum》1992,11(3):267-277

Of the two major approaches to line drawing, run-length slice algorithms are seldom used because of the division operation deemed necessary in these algorithms. The biggest advantage of these algorithms, the reduction of additions used, is considered outweighed by the division used. In this paper, a new run-length slice algorithm that does not require a division operation is presented. Furthermore, it uses the double-stepping paradigm in incremental line drawing algorithms to reduce the number of additions used by at least half. For sufficiently long lines, this algorithm uses at least 50% fewer arithmetic operations than Wu et al.'s bi-directional double-step incremental algorithm. But because of its high initialization cost, for short lines, it is less efficient. For a line with endpoints (0,0) and (δx, δy), the strategy is then to use the bi-directional Bresenham algorithm for very short lines (δx < 20), the bi-directional double-step algorithm for moderate long lines (20 ≤δx ≤ 110), and the new algorithmfor the longer lines (δx > 110). 相似文献

16.

圆的像素级生成及反走样算法 总被引：5，自引：1，他引：4

刘勇奎石教英《计算机辅助设计与图形学学报》2005,17(1):34-41

介绍了圆的逐点生成算法的研究现状,指出被忽视了的Kuzmin逐点生成圆弧算法具有最小计算量,指出并纠正其存在的严重错误;然后,提出了一种双点生成圆弧算法．该算法只用整数运算来选择距离圆弧最近的像素点,比较结果表明,该算法比现有其他算法具有更快的执行速度;最后,在文中算法的基础上提出了一个生成反走样圆弧的算法且没有增加算法的计算量．该算法与惟一可比的双点生成反走样圆弧的Wu—Rokne算法进行比较的结果表明,该算法比后者多产生了4个中间灰度级,并且所生成的反走样圆弧的最大光强误差比后者减少了40％．文中算法便于硬件实现．相似文献

17.

Efficient multiple-precision integer division algorithm

Debapriyay Mukhopadhyay Subhas C. Nandy 《Information Processing Letters》2014

Design and implementation of division algorithm is one of the most complicated problems in multi-precision arithmetic. Huang et al. [1] proposed an efficient multi-precision integer division algorithm, and experimentally showed that it is about three times faster than the most popular algorithms proposed by Knuth [2] and Smith [3]. This paper reports a bug in the algorithm of Huang et al. [1], and suggests the necessary corrections. The theoretical correctness proof of the proposed algorithm is also given. The resulting algorithm remains as fast as that of [1]. 相似文献

18.

MorphoSys reconfigurable hardware for?cryptography:?the?twofish?case 总被引：1，自引：0，他引：1

Sohaib Majzoub Hassan Diab 《The Journal of supercomputing》2012,59(1):22-41

This paper presents the mapping and performance analysis of the Twofish algorithm on MorphoSys. MorphoSys is a reconfigurable architecture that can provide high performance compared to custom hardware and yet preserves a level of flexibility compared to general-purpose processors. With today’s high demand for secure data transfer mediums including wired and wireless networks, there is a growing demand for real-time implementation of cryptographic algorithms. The choice of the Twofish algorithm, one of the five AES finalists, is because it is computationally intensive algorithm. It requires lookup tables, logical and arithmetic computations that stipulate high flexibility and performance. So it is a perfect algorithm to be mapped in order to evaluate such hardware. 相似文献

19.

A General Multi-step Algorithm for Voxel Traversing Along a Line

Y. K. Liu H. Y. Song B. &#;alik 《Computer Graphics Forum》2008,27(1):73-80

Traversing voxels along a three dimensional (3D) line is one of the most fundamental algorithms for voxel‐based applications. This paper presents a new 6‐connectivity integer algorithm for this task. The proposed algorithm accepts voxels having different sizes in x, y and z directions. To explain the idea of the proposed approach, a 2D algorithm is firstly considered and then extended in 3D. This algorithm is a multi‐step as up to three voxels may be added in one iteration. It accepts both integer and floating‐point input. The new algorithm was compared to other popular voxel traversing algorithms. Counting the number of arithmetic operations showed that the proposed algorithm requires the least amount of operations per traversed voxel. A comparison of spent CPU time using either integer or floating‐point arithmetic confirms that the proposed algorithm is the most efficient. This algorithm is simple, and in compact form which also makes it attractive for hardware implementation. 相似文献

20.

The use of configurable computing for computational kernels in scientific simulations

《Future Generation Computer Systems》2006,22(1-2):67-79

In many scientific simulation codes, the bulk of the floating-point arithmetic required is done by a small number of compact computational kernels. In this paper, we explore the potential use of configurable computers to instantiate the hardware required for such kernels and, thus, improve their performance. We present algorithms and analysis for two such kernels: fast, problem-specific multipliers and the efficient evaluation of Taylor series. A novel aspect of the algorithm for Taylor series evaluation is that it takes advantage of the variable precision arithmetic available to a configurable computer. Experimental results obtained on a Xilinx field-programmable gate array (FPGA) are presented for the proposed algorithms. 相似文献