共查询到20条相似文献,搜索用时 109 毫秒
1.
快速浮点加法器的优化设计 总被引:3,自引:0,他引:3
运算器的浮点数能够提供较大的表示精度和较大的动态表示范围,浮点运算已成为现代计算程序中不可缺少的部分.浮点加法运算是浮点运算中使用频率最高的运算,因此,浮点加法器的性能影响着整个CPU的浮点处理能力.文中从分析浮点加减操作的基本算法入手,介绍了一种新的算法,即三数据通道浮点加法算法,并着重介绍了整数加法器和移位器的设计,对32位浮点加法器的设计进行了优化. 相似文献
2.
随着集成电路技术的发展,电子设计自动化逐渐成为重要的设计手段,已经广泛应用于数字电路和数字信号处理系统等许多领域.文中介绍了基于VHDL语言设计的浮点FFT,本设计采用基2算法,单精度32位二进制的浮点形式,主控制器采用状态机建模.整个设计利用Xilinx公司提供的先进的ISE 5.3系列软件,采用了先进的结构化设计思想.总设计通过了Modelsim仿真与验证,二十多个模块的代码覆盖率达到100%.实践结果表明,应用VHDL实现的FFT处理器可快速完成浮点数据快速傅式变换,代码覆盖率也表明系统的测试工作比较完备.该系统可扩展到16点,32点的浮点FFT运算. 相似文献
3.
4.
LSRISC32位浮点陈列乘法器的设计 总被引:5,自引:2,他引:3
文章介绍LSRISC中的32位浮点乘法器的设计,它可用于完成定点32位整数与序数的乘法操作和IEEE754规定的单精度扩展浮点数据的乘法。 相似文献
5.
6.
32位浮点嵌入式MCU设计研究 总被引:1,自引:2,他引:1
本文介绍了一个基于RISC体系结构的32位浮点嵌入式MCU的设计实现。该:MCU内含128kbit的SRAM、采用哈佛结构、四级指令流水线、32位指令字长和内部43位数据字长。MCU内部设置多个快速寄存器及采用硬连线逻辑代替微程序控制的方法,加快了微处理器的速度,提高了指令执行效率。设计中还采用对寄存器同步写、异步读的方式避免了数据相关问题。 相似文献
7.
8.
文章介绍了32位RISC微处理器“龙腾R2”浮点处理单元的体系结构和设计,重点讨论了乱序执行、乱序、结束的高性能浮点流水线设计。为了实现流水线中的精确中断响应,本文采用了一种基于操作数指数和操作类型的浮点异常预测的方法.根据预测结果决定流水线的发射策略。基于0.18μm标准单元综合的结果表明:采用该方法实现的浮点处理流水线.与顺序控制和基于Tomasub算法实现的浮点处理单元相比,整个FPU在付出较少硬件面积的情况下得到了理想的效果.满足功能和时序要求。 相似文献
9.
10.
11.
12.
This paper presents a high throughput size-configurable floating point (FP) Fast Fourier Transform (FFT) processor, having implemented the 8-parallel multi-path delay feedback (MDF) functions suitable for applications in the real-time radar imaging system. With regard to floating-point FFT design, to acquire a high throughput with restricted area and power consumptions poses as a greater challenge due to some higher degrees of complexity involved in realizing of FP operations than those fixed-point counterparts. To address the related issues, a novel mixed-radix FFT algorithm featuring the single-sided binary-tree decomposition strategy is proposed aiming at effectively containing the complexity of multiplications for any 2k-point FFT. To this aid, the parallel-processing twiddle factor generator and the dual addition-and-rounding fused FP arithmetic units are optimized to meet the high accuracy demand in computation and the low power budget in implementation. The proposed FP FFT processor has been designed in silicon based on SMIC's 28 nm CMOS technology with the active area of 1.39 mm2. The prototype design delivers a throughput of 4 GSample/s at 500 MHz, at a peak power consumption of 84.2 mW. Thus, the proposed design approach achieves a significant improvement in power efficiency approximately by 14 times on average over some other FP FFT processors previously reported. 相似文献
13.
本文提出了一种新型混合基可重构FFT处理器,由支持基-2/3FFT的新型可重构蝶形单元和多路并行无冲突的存储器组成,实现了FFT过程中多路数据并行性和操作的连续性.本设计在TSMC28nm工艺下的最高频率为1.06GHz,同时在Xilinx的XC7V2000T FPGA芯片上搭建了混合基FFT处理器硬件测试系统.对混合基FFT处理器的FPGA硬件测试结果表明,本设计支持基-2、基-3和基-2/3混合模式FFT变换,且执行速度达到给定蝶乘器数量下的理论周期值,对单精度浮点数,混合基FFT处理器可提供10-5的结果精度. 相似文献
14.
With a huge increase in demand for various kinds of compute-intensive applications in electronic systems, researchers have focused on coarse-grained reconfigurable architectures because of their advantages: high performance and flexibility. This paper presents FloRA, a coarse-grained reconfigurable architecture with floating-point support. A two-dimensional array of integer processing elements in FloRA is configured at run-time to perform floating-point operations as well as integer operations. Fabricated using 130 nm process, the total area overhead due to additional hardware for floating-point operations is about 7.4% compared to the previous architecture which does not support floating-point operations. The fabricated chip runs at 125 MHz clock frequency and 1.2 V power supply. Experiments show 11.6× speedup on average compared to ARM9 with a vector-floating-point unit for integer-only benchmark programs as well as programs containing floating-point operations. Compared with other similar approaches including XPP and Butter, the proposed architecture shows much higher performance for integer applications, while maintaining about half the performance of Butter for floating-point applications. 相似文献
15.
Embedded systems designers often use fixed-point instead of floating-point due to the performance and area overhead of floating-point
units. If the range of floating-point representation is required, the system may use a software-based floating-point library
on an integer-only processor to save area—at the cost of much lower performance. Instead, we propose a Fractured Floating
Point Unit (FFPU)—a hybrid solution that uses a set of custom hardware instructions to accelerate software-based floating-point
emulation. An FFPU is intended as a compromise between software libraries and full FPUs in terms of both area and performance.
We present four potential 32-bit FFPU designs for a Nios II soft processor. We compare their performance and area to the baseline
Nios II, as well as a Nios II with a complete FPU. We show that an FFPU can improve various floating-point operations, including
improving addition and subtraction performance by 24 to 52 percent over the baseline. This performance comes at a resource
cost of only an 11 to 29 percent ALM increase, and no increase in DSP blocks. 相似文献
16.
Earl E. Swartzlander Jr. 《Journal of Signal Processing Systems》2008,53(1-2):3-14
This paper provides a personal perspective on developments in the implementation of two systolic fast Fourier transform processors over the last 25 years and identifies some of the lessons learned. This has been a period of tremendous advancements in integrated circuit technology that is demonstrated by the resulting processors. The first processor is the Modular Transform Processor that was developed at TRW in the 1982–1984 time frame using VLSI technology. It is a set of six large circuit boards that computes 4,096-point fast Fourier transforms using 22-bit floating-point arithmetic at sustained data rates of 40 MSPS. The second processor is a single ASIC chip systolic FFT processor developed by the Mayo Foundation in the 2001–2002 time frame that computes 4,096-point FFTs using 16-bit fixed-point arithmetic at sustained data rates of 200 MSPS. Some thoughts on the future directions of systolic FFT processor development are offered. Future systems will compute large transforms (e.g., 16 K-point to 1 M-point) at high data rates (e.g., 500 MSPS to 1 GSPS), will employ more precise arithmetic (e.g., 32-bit single precision IEEE Standard floating-point arithmetic), will consume very low power (e.g., on the order of one watt) and will be realized on a single chip. 相似文献
17.
The Letter demonstrates that a 10 bit reduced-complexity VLSI circuit can be used in place of a 32 bit floating-point processor to speed up some neural network applications, reducing circuit area and power consumption by 88% with a negligible increase in RMS error. Applications were executed on a radial basis function neurocomputer using the reduced-complexity circuit implemented with FPGA technology. One application produced better results than had been previously obtained for a NASA data set using either neural network or non-neural network approaches 相似文献
18.
高吞吐浮点可灵活重构的快速傅里叶变换(FFT)处理器可满足尖端雷达实时成像和高精度科学计算等多种应用需求。与定点FFT相比,浮点运算复杂度更高,使得浮点型FFT的运算吞吐率与其实现面积、功耗之间的矛盾问题尤为突出。鉴于此,为降低运算复杂度,首先将大点数FFT分解成若干个小点数基2k 级联子级实现,提出分别针对128/256/512/1024/2048点FFT的优化混合基算法。同时,结合所提出同时支持单通道单精度和双通道半精度两种浮点模式的新型融合加减与点乘运算单元,首次提出一款高吞吐率双模浮点可变点FFT处理器结构,并在28 nm标准CMOS工艺下进行设计并实现。实验结果表明,单通道单精度和双通道半精度浮点两种模式下的运算吞吐率和输出平均信号量化噪声比分别为3.478 GSample/s, 135 dB和6.957 GSample/s, 60 dB。归一化吞吐率面积比相比于现有其他浮点FFT实现可提高约12倍。 相似文献
19.
《电子学报:英文版》2016,(6):1063-1070
Fast Fourier transform (FFT) accelerator and Coordinate rotation digital computer (CORDIC) algorithm play important roles in signal processing.We propose a conflgurable floating-point FFT accelerator based on CORDIC rotation,in which twiddle direction prediction is presented to reduce hardware cost and twiddle angles are generated in real time to save memory.To finish CORDIC rotation efficiently,a novel approach in which segmentedparallel iteration and compress iteration based on CSA are presented and redundant CORDIC is used to reduce the latency of each iteration.To prove the efficiency of our FFT accelerator,four FFT accelerators are prototyped into a FPGA chip to perform a batch-FFT.Experimental results show that our structure,which is composed of four butterfly units and finishes FFT with the size ranging from 64 to 8192 points,occupies 33230(3%) REGs and 143006(30%)LUTs.The clock frequency can reach 122MHz.The resources of double-precision FFT is only about 2.5 times of single-precision while the theoretical value is 4.What's more,only 13331 cycles are required to implement 8192-points double-precision FFT with four butterfly units in parallel. 相似文献
20.
DSP芯片中浮点加法器的速度制约着整个芯片的工作速度,浮点加法器中LOD电路的速度又是浮点加法器工作速度的瓶颈。因此,我们可以通过对LOD电路的改进,来提高整个DSP芯片的工作性能。我们从LOD的组成结构和逻辑两个方面进行设计,实现了一种快速、高效的LOD电路。它针对处理的数据格式为TMS320C3X扩展精度浮点数据格式。 相似文献