A floating point conversion algorithm for mixed precision computations期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A floating point conversion algorithm for mixed precision computations

Authors:	Choon Lih HOO Sallehuddin Mohamed HARIS Nik Abdullah Nik MOHAMED

Affiliation:	(Department of Mechanical and Materials Engineering, Universiti Kebangsaan Malaysia, UKM Bangi 43600, Malaysia)

Abstract:	The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. Double precision (64-bit) representations allow for a wider range of real numbers to be denoted. However, single precision (32-bit) operations are more efficient. Recently, there has been an increasing interest in mixed precision computations which take advantage of single precision efficiency on 64-bit numbers. This calls for the ability to interchange between the two formats. In this paper, an algorithm that converts floating point numbers from 64- to 32-bit representations is presented. The algorithm was implemented as a Verilog code and tested on field programmable gate array (FPGA) using the Quartus II DE2 board and Agilent 16821A portable logic analyzer. Results indicate that the algorithm can perform the conversion reliably and accurately within a constant execution time of 25 ns with a 20 MHz clock frequency regardless of the number being converted.

Keywords:	Double precision Single precision FPGA Verilog HooHar algorithm
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏