首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
邓学禹 《电讯技术》2005,45(2):188-191
为了提高快速傅里叶变换(FFT)处理数据的实时性,本文利用现场可编程阵列(FPGA)逻辑资源丰富、运算速度快的特点以及FFT算法的分级特性,实现了高速、高阶FFT的流水线工作方式设计。通过本文介绍的设计方法,在Xilinx公司Virtex-II系列FPGA上实现了工作频率50MHz以上、数据流水输入、输出的1 024点按时间抽取FFT。  相似文献   

2.
基于FPGA高精度浮点运算器的FFT设计与仿真   总被引:1,自引:0,他引:1  
张雪姣  伍萍辉 《电子科技》2011,24(12):88-90
基于IEEE浮点表示格式及FFT算法,提出一种基2FFT的FPGA方法,完成了基于FPGA高精度浮点运算器的FFT的设计。利用VHDL语言描述了蝶形运算过程及地址产生单元,其仿真波形基本能正确的表示输出结果  相似文献   

3.
夏明赟  蒋涛 《通信技术》2012,45(7):113-115
短时傅里叶变换(STFT)由于其算法简单、处理时间短及易于实现等优点,因此其在图像处理、语音分析、信号检测及参数估计等领域获得越来越多应用。通过分析短时傅里叶变换算法原理,设计了一种基于现场可编程逻辑器件(FPGA)的高速短时傅里叶实现结构,该结构充分利用蝶形单元运算特点,在满足时间分辨率及频率分辨率的基础上降低了运算复杂度,并在高速率运行时钟下节省了硬件资源。  相似文献   

4.
利用R22SDF算法的低复杂度的特点,在其基础上演变出一种通用的FFT算法.该方法可适用于所有的2n点FFT运算.该算法采用流水线结构,以满足数据实时性处理的要求.  相似文献   

5.
DFT是一种应用广泛的数学变换工具,MATLAB是一款功能强大的科学计算语言.MATLAB提供的FFT函数解决TDFT的快速计算问题,但由于它是内建函数而不能了解到软件实现的过程.文章以按时间抽取的基2FFT算法为例,根据快速傅里叶变换的原理和规律,绘出了算法实现的程序框图,列出了MATLAB环境下软件实现的程序,建立...  相似文献   

6.
通过流水线结构和乒乓RAM相结合,改进了时域抽取的Radix-4算法,实现了一种适合于OFDM系统的高效流水线型FFT(IFFTI)处理器的VLSI设计.在时钟频率125 MHz下,完成一次1024点16 bit位长的复数FFT需时49.57μs.  相似文献   

7.
在分析了快速傅里叶算法理论的基础上,提出了一种频率抽取基4FFT的FPGA设计方案,针对现有FFT的FPGA实现过程中蝶形运算需要频繁乘以多个旋转因子提出了改进方法,减少了旋转因子的乘法次数和存储空间,加快了蝶形运算的速度,设计的地址映射方法,无需运算即可得到所需数据的存放地址,并结合采用乒乓结构和流水线方式,来提高快速傅里叶变换(FFT)FPGA实现的速度,为实现FFT算法提供了一定的参考价值。  相似文献   

8.
高吞吐浮点可灵活重构的快速傅里叶变换(FFT)处理器可满足尖端雷达实时成像和高精度科学计算等多种应用需求。与定点FFT相比,浮点运算复杂度更高,使得浮点型FFT的运算吞吐率与其实现面积、功耗之间的矛盾问题尤为突出。鉴于此,为降低运算复杂度,首先将大点数FFT分解成若干个小点数基2k 级联子级实现,提出分别针对128/256/512/1024/2048点FFT的优化混合基算法。同时,结合所提出同时支持单通道单精度和双通道半精度两种浮点模式的新型融合加减与点乘运算单元,首次提出一款高吞吐率双模浮点可变点FFT处理器结构,并在28 nm标准CMOS工艺下进行设计并实现。实验结果表明,单通道单精度和双通道半精度浮点两种模式下的运算吞吐率和输出平均信号量化噪声比分别为3.478 GSample/s, 135 dB和6.957 GSample/s, 60 dB。归一化吞吐率面积比相比于现有其他浮点FFT实现可提高约12倍。  相似文献   

9.
针对高速64点FFT(快速傅里叶变换)处理芯片的实现,分析了FFT运算原理,并根据FFT算法原理介绍了改进的FFT运算流图。介绍了FFT处理器系统的各模块的功能划分,并根据FFT处理器结构及其特殊寻址方式,采用Verilog HDL对处理器系统的控制器、双数据缓存、地址生成器、蝶形运算单元以及I/O控制等模块进行了RTL(寄存器传输级)设计,并在ModelSim中对各模块以及整个系统进行功能仿真和验证,给出了部分关键模块的仿真波形图。设计中,注重从硬件实现以及电路的可综合性等角度进行RTL电路设计,以确保得到与期望性能相符的硬件电路。  相似文献   

10.
描述了一种高效的FFT(fast Fourier transform)流水线结构,采用这种流水线结构不仅能提高数据速率,而且能有效减小设计的规模.作为OFDM(orthogonal frequency division multiplexing)系统实现的关键部分,FFT的设计关系到整个系统的实现规模.作为应用之一,笔者在DVB-T接收机中采用了这种FFT结构,实现了对2K/8K双模式的解调.该结构还可方便地应用到其他应用FFT的场合,且易于实现多种模式的并存.  相似文献   

11.
This work describes a floating-point arithmetic unit based on the CORDIC algorithm. The unit computes a full set of high level arithmetic and elementary functions: multiplication, division, (co)sine, hyperbolic (co)sine, square root, natural logarithm, inverse (hyperbolic) tangent, vector norm, and phase. The chip has been integrated in 1.6 μm double-metal n-well CMOS technology and achieves a normalized peak performance of 220 MFLOPS  相似文献   

12.
《Electronics letters》2002,38(16):857-858
A new floating-point (FP) normalisation unit scheme is presented, that achieves enhanced performance by merging a leading zero counter (LZC) and a normalisation shifter. The LZC and the shift decoder are combined by using NOR planes to generate control signals directly to the normalisation shifter. The chip has been fabricated with a five-metal 0.18 μm CMOS process and performs the 64 bit FP normalisation within 1.4 ns  相似文献   

13.
With a huge increase in demand for various kinds of compute-intensive applications in electronic systems, researchers have focused on coarse-grained reconfigurable architectures because of their advantages: high performance and flexibility. This paper presents FloRA, a coarse-grained reconfigurable architecture with floating-point support. A two-dimensional array of integer processing elements in FloRA is configured at run-time to perform floating-point operations as well as integer operations. Fabricated using 130 nm process, the total area overhead due to additional hardware for floating-point operations is about 7.4% compared to the previous architecture which does not support floating-point operations. The fabricated chip runs at 125 MHz clock frequency and 1.2 V power supply. Experiments show 11.6× speedup on average compared to ARM9 with a vector-floating-point unit for integer-only benchmark programs as well as programs containing floating-point operations. Compared with other similar approaches including XPP and Butter, the proposed architecture shows much higher performance for integer applications, while maintaining about half the performance of Butter for floating-point applications.  相似文献   

14.
A CMOS pipelined floating-point processing unit (FPU) for superscalar processors is described. It is fabricated using a 0.5 μm CMOS triple-metal-layer technology on a 61 mm2 die. The FPU has two execution modes to meet precise scientific computations and real-time applications. It can start two FPU operations in each cycle, and this achieves a peak performance of 160 MFLOPS double or single precision with an 80 MHz clock. Furthermore, the original computation mode, twin single-precision computation, double the peak performance and delivers 320 MFLOPS single precision. Its full bypass reduces the latency of operations, including load and store, and achieves an effective throughput even in nonvectorizable computations. An out-of-order completion is provided by using a new exception prediction method and a pipeline stall technique  相似文献   

15.
田祎  颜军 《电子设计工程》2012,20(12):13-15,20
浮点运算器的核心运算部件是浮点加法器,它是实现浮点指令各种运算的基础,其设计优化对于提高浮点运算的速度和精度相当关键。文章从浮点加法器算法和电路实现的角度给出设计方法,通过VHDL语言在QuartusII中进行设计和验证,此加法器通过状态机控制运算,有效地降低了功耗,提高了速度,改善了性能。  相似文献   

16.
A simplified synthesis of transmission lines with a tree structure   总被引:1,自引:0,他引:1  
The limiting factor for high-performance systems is being set by interconnection delay rather than transistor switching speed. The advances in circuits speed and density are placing increasing demands on the performance of interconnections, for example chip-to-chip interconnection on multichip modules. To address this extremely important and timely research area, we analyze in this paper the circuit property of a generic distributedRLC tree which models interconnections in high-speed IC chips. The presented result can be used to calculate the waveform and delay in anRLC tree. The result on theRLC tree is then extended to the case of a tree consisting of transmission lines. Based on an analytical approach a two-pole circuit approximation is presented to provide a closed form solution. The approximation reveals the relationship between circuit performance and the design parameters which is essential to IC layout designs. A simplified formula is derived to evaluate the performance of VLSI layout.  相似文献   

17.
The floating-point unit (FPU) in the synergistic processor element (SPE) of a CELL processor is a fully pipelined 4-way single-instruction multiple-data (SIMD) unit designed to accelerate media and data streaming with 128-bit operands. It supports 32-bit single-precision floating-point and 16-bit integer operands with two different latencies, six-cycle and seven-cycle, with 11 FO4 delay per stage. The FPU optimizes the performance of critical single-precision multiply-add operations. Since exact rounding, exceptions, and de-norm number handling are not important to multimedia applications, IEEE correctness on the single-precision floating-point numbers is sacrificed for performance and simple design. It employs fine-grained clock gating for power saving. The design has 768K transistors in 1.3 mm/sup 2/, fabricated SOI in 90-nm technology. Correct operations have been observed up to 5.6 GHz with 1.4 V and 56/spl deg/C, delivering 44.8 GFlops. Architecture, logic, circuits, and integration are codesigned to meet the performance, power, and area goals.  相似文献   

18.
A vector unit for high-performance three-dimensional graphics computing has been developed. We implement four floating-point multiply-accumulate units, which execute multiply-add operations with one throughput; one floating-point divide/square root unit, which executes division and square-root operations with six cycles at 300 MHz; and one vector general-purpose register file, which has 128 bits×32 words. The parallel execution of all units delivers a peak performance of 2.44 GFLOPS at 300 MHz  相似文献   

19.
提出一种浮点型数字信号处理器(DSP)硬核结构,在兼容定点数运算的同时,也为浮点数运算提供较好支持。目前各大现场可编程门阵列(FPGA)主流厂商在实现浮点数运算功能时均采用软核实现方式,即将浮点数运算算法映射到芯片上,通过逻辑资源和DSP模块实现。相比于传统方法,提出的硬核结构在不占用FPGA中其他逻辑资源情况下,仅利用DSP模块便能完成浮点数运算。设计中,充分考虑负载和时延影响,插入多级流水线,显著提高浮点数的计算效率。采用中芯国际(MCI)28 nm工艺设计并完成所提出的浮点型DSP硬核结构。仿真结果表明,所提出的硬核结构的单个浮点数加法和乘法效率为0.4 Gflops。  相似文献   

20.
《信息技术》2017,(9):113-116
文中对扩展CORDIC算法进行研究,在此基础上,设计并实现了浮点超越函数硬件加速器。该加速器可在向量模式下完成反正切、反双曲正切和对数函数三种浮点超越函数的计算,设计采用IEEE-754标准单精度浮点数格式,计算流程划分为X、Y、Z三条数据通路,包含对阶、映射、迭代、补偿、规格化5步处理。实验结果表明,在Xilinx Virtex6器件上电路的最大工作频率可达221MHz,计算周期减少。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号