期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于动态可重构的FFT处理器的设计与实现 总被引：3，自引：1，他引：2

潘伟刘欢李广军《微电子学》2009,39(1)

提出了一种基于局部动态可重构(DPR)的新型可重构FFT处理器.相比传统的FFT设计,该设计方法在重构时间上得到了很大改进,同时,处理器能够动态地添加或移除重构单元.采用新颖的FFT控制算法,使得可重构部分面积很小.该处理器结构在Xilinx Viirtex2p系列FPGA上进行了综合及后仿真.较之Xilinx IPcore,其运算效率明显提高,而且还实现了IP核所不具备的动态可重构性. 相似文献

2.

Configurable Floating-Point FFT Accelerator on FPGA Based Multiple-Rotation CORDIC

《电子学报:英文版》2016,(6):1063-1070

Fast Fourier transform (FFT) accelerator and Coordinate rotation digital computer (CORDIC) algorithm play important roles in signal processing.We propose a conflgurable floating-point FFT accelerator based on CORDIC rotation,in which twiddle direction prediction is presented to reduce hardware cost and twiddle angles are generated in real time to save memory.To finish CORDIC rotation efficiently,a novel approach in which segmentedparallel iteration and compress iteration based on CSA are presented and redundant CORDIC is used to reduce the latency of each iteration.To prove the efficiency of our FFT accelerator,four FFT accelerators are prototyped into a FPGA chip to perform a batch-FFT.Experimental results show that our structure,which is composed of four butterfly units and finishes FFT with the size ranging from 64 to 8192 points,occupies 33230(3％) REGs and 143006(30％)LUTs.The clock frequency can reach 122MHz.The resources of double-precision FFT is only about 2.5 times of single-precision while the theoretical value is 4.What's more,only 13331 cycles are required to implement 8192-points double-precision FFT with four butterfly units in parallel. 相似文献

3.

Parallel Memory Accessing for FFT Architectures

V. Kitsakis K. Nakos D. Reisis N. Vlassopoulos 《Journal of Signal Processing Systems》2018,90(11):1593-1607

The current paper introduces an efficient technique for parallel data addressing in FFT architectures performing in-place computations. The novel addressing organization provides parallel load and store of the data involved in radix-r butterfly computations and leads to an efficient architecture when r is a power of 2. The addressing scheme is based on a permutation of the FFT data, which leads to the improvement of the address generating circuit and the butterfly processor control. Moreover, the proposed technique is suitable for mixed radix applications, especially for radixes that are powers of 2 and straightforward continuous flow implementation. The paper presents the technique and the resulting FFT architecture and shows the advantages of the architecture compared to hitherto published results. The implementations on a Xilinx FPGA Virtex-7 VC707 of the in-place radix-8 FFT architectures with input sizes 64 and 512 complex points validate the results. 相似文献

4.

FPGA based multi-channel variable-length FFT implementation

下载免费PDF全文

WANG Jiawei YU Le YANG Haigang FENG Guanglang SUN Jiabin LUO Yang 《太赫兹科学与电子信息学报》2017,15(3):469-474

High-speed real-time digital frequency analysis is one major field of Fast Fourier Transform (FFT) application, such as Synthetic Aperture Radar(SAR) processing and medical imaging. In SAR processing, the image size could be 4 k×4 k in normal and it has become larger over the years. In the view of real-time, extensibility and reusable characteristics, an Field Programmable Gate Array(FPGA) based multi-channel variable-length FFT architecture which adopts radix-2 butterfly algorithm is proposed in this paper. The hardware implementation of FFT is partially reconfigurable architecture. Firstly, the proposed architecture in the paper has flexibility in terms of chip area, speed, resource utilization and power consumption. Secondly, the proposed architecture combines serial and parallel methods in its butterfly computations. Furthermore, on system-level issue, the proposed architecture takes advantage of state processing in serial mode and data processing in parallel mode. In case of sufficient FPGA resources, state processing of serial mode mentioned above is converted to pipeline mode. State processing of pipeline mode achieves high throughput. 相似文献

5.

基于FPGA的可配置 FFT IP核实现研究

李大习《电子科技》2014,27(6):46-49,53

针对FFT算法基于FPGA实现可配置的IP核。采用基于流水线结构和快速并行算法实现了蝶形运算和4 k点FFT的输入点数、数据位宽、分解基自由配置。使用Verilog 语言编写,利用ModelSim仿真,由ISE综合并下载,在Xilinx公司的Virtex-5 xc5vfx70t器件上以200 MHz 的时钟实现验证,运算结果与其他设计的运算效率对比有一定优势 相似文献

6.

An Area Efficient FFT/IFFT Processor for MIMO-OFDM WLAN 802.11n

Bo Fu Paul Ampadu 《Journal of Signal Processing Systems》2009,56(1):59-68

A pipelined Fast Fourier Transform and its inverse (FFT/IFFT) processor, which utilizes hardware resources efficiently, is proposed for MIMO-OFDM WLAN 802.11n. Compared with a conventional MIMO-OFDM implementation, (in which as many FFT/IFFT processors as the number of transmit/receive antennas is used), the proposed architecture (using hardware sharing among multiple data sequences) reduces hardware complexity without sacrificing system throughput. Further, the proposed architecture can support 1–4 input data sequences with sequence lengths of 64 or 128, as needed. The FFT/IFFT processor is synthesized using TSMC 0.18 um CMOS technology and saves 25% area compared to a conventional implementation approach using radix-2³ algorithm. The proposed FFT/IFFT processor can be configured to improve power efficiency according to the number of input data sequences and the sequence length. The processor consumes 38 mW at 75 MHz for one input sequence with 64-point length; it consumes 87 mW at 75 MHz for four input sequences with length 128-point and can be efficiently used for IEEE 802.11n WLAN standard.

Paul AmpaduEmail:

相似文献

7.

A Low‐Complexity 128‐Point Mixed‐Radix FFT Processor for MB‐OFDM UWB Systems

Sang‐In Cho Kyu‐Min Kang 《ETRI Journal》2010,32(1):1-10

In this paper, we present a fast Fourier transform (FFT) processor with four parallel data paths for multiband orthogonal frequency‐division multiplexing ultra‐wideband systems. The proposed 128‐point FFT processor employs both a modified radix‐2⁴ algorithm and a radix‐2³ algorithm to significantly reduce the numbers of complex constant multipliers and complex booth multipliers. It also employs substructure‐sharing multiplication units instead of constant multipliers to efficiently conduct multiplication operations with only addition and shift operations. The proposed FFT processor is implemented and tested using 0.18 µm CMOS technology with a supply voltage of 1.8 V. The hardware‐ efficient 128‐point FFT processor with four data streams can support a data processing rate of up to 1 Gsample/s while consuming 112 mW. The implementation results show that the proposed 128‐point mixed‐radix FFT architecture significantly reduces the hardware cost and power consumption in comparison to existing 128‐point FFT architectures. 相似文献

8.

Real-time FFT algorithm applied to on-line spectral analysis

Pei-Chen Lo Yu-Yun Lee 《Circuits, Systems, and Signal Processing》1999,18(4):377-393

On-line running spectral analysis is of considerable interest in many electrophysiological signals, such as the EEG (electroencephalograph). This paper presents a new method of implementing the fast Fourier transform (FFT) algorithm. Our real-time FFT algorithm efficiently utilizes computer time to perform the FFT computation while data acquisition proceeds so that local butterfly modules are built using the data points that are already available. The real-time FFT algorithm is developed using the decimation-in-time split-radix FFT (DIT sr-FFT) butterfly structure. In order to demonstate the synchronization ability of the proposed algorithm, the authors develop a method of evaluating the number of arithmetic operations that it requires. Both the derivation and the experimental result show that the real-time FFT algorithm is superior to the conventional whole-block FFT algorithm in synchronizing with the data acquisition process. Given that the FFT sizeN=2^r, real-time implementation of the FFT algorithm requires only 2/r the computational time required by the whole-block FFT algorithm.This work was supported by the National Science Council of Taiwan, Republic of China, under grant NSC87-2213-E-009-128. 相似文献

9.

混合基可重构FFT处理器的设计与实现

宋宇鲲曲双双徐礼晗张多利《微电子学与计算机》2020,(1):87-92,98

本文提出了一种新型混合基可重构FFT处理器,由支持基-2/3FFT的新型可重构蝶形单元和多路并行无冲突的存储器组成,实现了FFT过程中多路数据并行性和操作的连续性.本设计在TSMC28nm工艺下的最高频率为1.06GHz,同时在Xilinx的XC7V2000T FPGA芯片上搭建了混合基FFT处理器硬件测试系统.对混合基FFT处理器的FPGA硬件测试结果表明,本设计支持基-2、基-3和基-2/3混合模式FFT变换,且执行速度达到给定蝶乘器数量下的理论周期值,对单精度浮点数,混合基FFT处理器可提供10-5的结果精度. 相似文献

10.

基于RCSIMD的8192点FFT并行算法研究

周国昌张立新《微电子学与计算机》2011,28(4)

文中提出了一种基于RCSIMD体系结构的8192点FFT的并行算法.该并行算法将8192数据分成连续64块,每块128个连续数据(存储在可重构处理元的局部存储器),采用RCSIMD可重构处理阵列完成块倒位序变换,块内只进行逻辑上的倒位序变换(倒位序过程隐含在配置数据中).这种数据存储和倒位序处理方法可以充分利用处理阵列通信网络和处理单元的能力. 相似文献

11.

高速64点FFT芯片设计技术

赵梅丁晓磊朱恩《电子工程师》2007,33(3):13-17

针对高速64点FFT(快速傅里叶变换)处理芯片的实现,分析了FFT运算原理,并根据FFT算法原理介绍了改进的FFT运算流图。介绍了FFT处理器系统的各模块的功能划分,并根据FFT处理器结构及其特殊寻址方式,采用Verilog HDL对处理器系统的控制器、双数据缓存、地址生成器、蝶形运算单元以及I/O控制等模块进行了RTL(寄存器传输级)设计,并在ModelSim中对各模块以及整个系统进行功能仿真和验证,给出了部分关键模块的仿真波形图。设计中,注重从硬件实现以及电路的可综合性等角度进行RTL电路设计,以确保得到与期望性能相符的硬件电路。相似文献

12.

高效可配置浮点FFT处理器设计

桑红石高伟《微电子学与计算机》2012,29(4):36-40

为了克服高精度浮点FFT处理器具有较大资源开销的设计瓶颈,采用基于单口存储器的FIFO构建共享蝶形结构的R2/22SDF流水可配置结构.采用适合浮点设计的基2/22算法实现流水结构,不仅有利于可配置电路的实现,还能够有效减少复数乘法次数,提高复数乘法器的计算效率.采用双倍数据位宽的单口存储器实现FIFO存储器,有效避免了双口存储器面积和功耗较大的问题.改进的蝶形共享结构实现两级蝶形的合并,解决了单路径延迟反馈流水线结构蝶形单元利用率低的问题.与传统流水线结构FFT处理器设计相比,有效降低了浮点设计中的资源开销,提高了计算单元的利用效率. 相似文献

13.

Fully Systolic FFT Architecture for Giga-sample Applications

K. Babionitakis V. A. Chouliaras K. Manolopoulos K. Nakos D. Reisis N. Vlassopoulos 《Journal of Signal Processing Systems》2010,58(3):281-299

We present a novel 4096 complex-point, fully systolic VLSI FFT architecture based on the combination of three consecutive radix-4 stages resulting in a 64-point FFT engine. The outcome of cascading these 64-point FFT engines is an improved architecture that efficiently processes large input data sets in real time. Using 64-point FFT engines reduces the buffering and the latency to one third of a fully unfolded radix-4 architecture, while the radix-4 schema simplifies the calculations within each engine. The proposed 4096 complex point architecture has been implemented on a FPGA achieving a post-route clock frequency of 200 MHz resulting in a sustained throughput of 4096 point/20.48 μs. It has also been implemented on a high performance 0.13 μm, 1P8M CMOS process achieving a worst-case (0.9 V, 125 C) post-route clock frequency of 604.5 MHz and a sustained throughput of 4096 point/3.89 μs while consuming 4.4 W. The architecture is extended to accomplish FFT computations of 16K, 64K and 256K complex points with 352, 256 and 188 MHz operating frequencies respectively. 相似文献

14.

High Performance Reconfigurable FIR Filter Architecture Using Optimized Multiplier

J. L. Mazher Iqbal S. Varadarajan 《Circuits, Systems, and Signal Processing》2013,32(2):663-682

In mobile communication systems and multimedia applications, need for efficient reconfigurable digital finite impulse response (FIR) filters has been increasing tremendously because of the advantage of less area, low cost, low power and high speed of operation. This article presents a near optimum low- complexity, reconfigurable digital FIR filter architecture based on computation sharing multipliers (CSHM), constant shift method (CSM) and modified binary-based common sub-expression elimination (BCSE) method for different word-length filter coefficients. The CSHM identifies common computation steps and reuses them for different multiplications. The proposed reconfigurable FIR filter architecture reduces the adders cost and operates at high speed for low-complexity reconfigurable filtering applications such as channelization, channel equalization, matched filtering, pulse shaping, video convolution functions, signal preconditioning, and various other communication applications. The proposed architecture has been implemented and tested on a Virtex 2 xc2vp2-6fg256 field-programmable gate array (FPGA) with a precision of 8-bits, 12-bits, and 16-bits filter coefficients. The proposed novel reconfigurable FIR filter architecture using dynamically reconfigurable multiplier block offers good area and speed improvement compared to existing reconfigurable FIR filter implementations. 相似文献

15.

宽带自适应OFDM系统中可伸缩FFT处理器的设计和实现

余辉张朝阳《电路与系统学报》2006,11(1):71-75,80

基于Radix-22 SDF(single-path delay feedback)的蝶形运算结构设计了一个级数在64、256、1024、2048之间可选的可伸缩FFT(Scaleable FFT)处理器,以较少的硬件规模满足了宽带自适应正交频分复用(OFDM)传输系统子载波数目可变、数据流量高、低处理延迟、设置灵活的处理要求.文中还针对输入OFDM信号的波形分布特性,仿真分析了该FFT处理器在采用不同的中间处理字长和旋转因子量化字长时其输出信噪比和所占用逻辑单元数目的变化,并据此合理选择了实现参数,在性能提高的同时有效减少了其硬件规模. 相似文献

16.

Design and Implementation of 256‐Point Radix‐4 100 Gbit/s FFT Algorithm into FPGA for High‐Speed Applications

下载免费PDF全文

Gokhan Polat Sitki Ozturk Mehmet Yakut 《ETRI Journal》2015,37(4):667-676

The third‐party FFT IP cores available in today's markets do not provide the desired speed demands for optical communication. This study deals with the design and implementation of a 256‐point Radix‐4 100 Gbit/s FFT, where computational steps are reconsidered and optimized for high‐speed applications, such as radar and fiber optics. Alternative methods for FFT implementation are investigated and Radix‐4 is decided to be the optimal solution for our fully parallel FPGA application. The algorithms that we will implement during the development phase are to be tested on a Xilinx Virtex‐6 FPGA platform. The proposed FFT core has a fully parallel architecture with a latency of nine clocks, and the target clock rate is 312.5 MHz. 相似文献

17.

Cost-Effective Triple-Mode Reconfigurable Pipeline FFT/IFFT/2-D DCT Processor

Chin-Teng Lin Yuan-Chu Yu Lan-Da Van 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(8):1058-1071

This investigation proposes a novel radix-4² algorithm with the low computational complexity of a radix-16 algorithm but the lower hardware requirement of a radix-4 algorithm. The proposed pipeline radix-4² single delay feedback path (R4²SDF) architecture adopts a multiplierless radix-4 butterfly structure, based on the specific linear mapping of common factor algorithm (CFA), to support both 256-point fast Fourier transform/inverse fast Fourier transform (FFT/IFFT) and 8times8 2D discrete cosine transform (DCT) modes following with the high efficient feedback shift registers architecture. The segment shift register (SSR) and overturn shift register (OSR) structure are adopted to minimize the register cost for the input re-ordering and post computation operations in the 8times8 2D DCT mode, respectively. Moreover, the retrenched constant multiplier and eight-folded complex multiplier structures are adopted to decrease the multiplier cost and the coefficient ROM size with the complex conjugate symmetry rule and subexpression elimination technology. To further decrease the chip cost, a finite wordlength analysis is provided to indicate that the proposed architecture only requires a 13-bit internal wordlength to achieve 40-dB signal-to-noise ratio (SNR) performance in 256-point FFT/IFFT modes and high digital video (DV) compression quality in 8 times 8 2D DCT mode. The comprehensive comparison results indicate that the proposed cost effective reconfigurable design has the smallest hardware requirement and largest hardware utilization among the tested architectures for the FFT/IFFT computation, and thus has the highest cost efficiency. The derivation and chip implementation results show that the proposed pipeline 256-point FFT/IFFT/2D DCT triple-mode chip consumes 22.37 mW at 100 MHz at 1.2-V supply voltage in TSMC 0.13-mum CMOS process, which is very appropriate for the RSoCs IP of next-generation handheld devices. 相似文献

18.

An area-efficient and low-power 64-point pipeline Fast Fourier Transform for OFDM applications

《Integration, the VLSI Journal》2017

In an orthogonal frequency division multiplexing (OFDM) based wireless systems, Fast Fourier Transform (FFT) is a critical block as it occupies large area and consumes more power. In this paper, we present an area-efficient and low power 16-bit word-width 64-point radix-2² and radix-2³ pipelined FFT architectures for an OFDM-based IEEE 802.11a wireless LAN baseband. The designs are derived from radix-2^k algorithm and adopt a Single-Path Delay Feedback (SDF) architecture for hardware implementation. To eliminate the complex multipliers and read-only memory (ROM) which is used for internal storage of twiddle factor coefficients, the proposed 64-point FFT employs a Canonical Signed Digit (CSD) complex constant multiplier using adders, multiplexers and shifters. The complex constant multiplier (CCM) is modified using common sub-expression sharing block that reduces the area of the design. The proposed radix-2² and radix-2³ pipelined FFT architectures are modeled and implemented using TSMC 180 nm CMOS technology with a supply voltage of 1.8 V. The implementation results show that the proposed architectures significantly reduces the hardware cost and power consumption in comparison to existing 64-point FFT architectures. 相似文献

19.

A power-scalable reconfigurable FFT/IFFT IC based on a multi-processor ring

Guichang Zhong Fan Xu Willson A.N. Jr. 《Solid-State Circuits, IEEE Journal of》2006,41(2):483-495

A single-chip reconfigurable FFT/IFFT processor that employs a ring-structured multiprocessor architecture is presented. Multi-level reconfigurability is realized by dynamically allocating computation resources needed by specific applications. The processor IC was fabricated in 0.25-/spl mu/m CMOS. It performs 8-point to 4096-point complex FFT/IFFT with power-consumption scalability and provides useful trade-offs between algorithm flexibility, implementation complexity and energy efficiency. 相似文献

20.

基于反馈控制结构的FFT信号处理器设计与FPGA实现

宗爱华张双吴恙杨维明《信息通信》2012,(3):49-51

研究了基于FPGA的基-2 FFT算法的设计与实现。为减小硬件资源开销,论文采用蝶形运算单元和控制器单元构成的反馈结构对基-2 FFT处理器的硬件j结构进行了总体设计,采用时序控制方法完成蝶形运算电路设计,采用同步有限状态机(FSM,finite state machine)方法实现了旋转因子系数的产生与控制。并基于Quartus II软件平台,完成了整个FFT处理器电路的FPGA实现,最后通过仿真验证了设计方案的正确性。相似文献