期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

New radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) and radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) DIF FFT algorithms for 3-D DFT

Bouguezel S. Ahmad M.O. Swamy M.N.S. 《IEEE transactions on circuits and systems. I, Regular papers》2006,53(2):306-315

In this paper, new three-dimensional (3-D) radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) and radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) decimation-in-frequency (DIF) fast Fourier transform (FFT) algorithms are developed and their implementation schemes discussed. The algorithms are developed by introducing the radix-2/4 and radix-2/8 approaches in the computation of the 3-D DFT using the Kronecker product and appropriate index mappings. The butterflies of the proposed algorithms are characterized by simple closed-form expressions facilitating easy software or hardware implementations of the algorithms. Comparisons between the proposed algorithms and the existing 3-D radix-(2/spl times/2/spl times/2) FFT algorithm are carried out showing that significant savings in terms of the number of arithmetic operations, data transfers, and twiddle factor evaluations or accesses to the lookup table can be achieved using the radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) DIF FFT algorithm over the radix-(2/spl times/2/spl times/2) FFT algorithm. It is also established that further savings can be achieved by using the radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) DIF FFT algorithm. 相似文献

2.

Mixed-radix,virtually scaling-free CORDIC algorithm based rotator for DSP applications

《Integration, the VLSI Journal》2021

In this work, we proposed a novel Coordinate Rotation DIgital Computer (CORDIC) rotator algorithm that converges faster by performing radix-2,4 and 16 CORDIC iterations while maintaining the scale factor implicitly constant. A mixed-radix is used to achieve convergence faster to reduce the computational latency of the CORDIC algorithm. The main concern of the higher radix CORDIC algorithm is the compensation of a variable scale factor. To solve this problem, the Taylor series approximation of sine and cosine is proposed for a higher radix CORDIC algorithm to achieve the scaling-free rotation of the two-dimensional vector. The scaling-free rotation of the proposed CORDIC algorithm removes the read-only memory (ROM) needed to store scale factor of higher radix CORDIC algorithm. Further, the proposed CORDIC algorithm is designed in rotation mode and optimized by removing the Z datapath for the digital signal processing (DSP) applications for which the angle of rotation is known in advance. Finally, the multipath delay commutator (MDC) fast Fourier transform (FFT) algorithm is implemented with the proposed CORDIC algorithm based rotator on FPGA. The proposed design is compared with existing designs. In a comparison between the radix-16 CORDIC rotator based FFT implementation and our proposed implementation, it has been found out that implementation proposed in this article has used 17% fewer resources. 相似文献

3.

Configurable Floating-Point FFT Accelerator on FPGA Based Multiple-Rotation CORDIC

《电子学报:英文版》2016,(6):1063-1070

Fast Fourier transform (FFT) accelerator and Coordinate rotation digital computer (CORDIC) algorithm play important roles in signal processing.We propose a conflgurable floating-point FFT accelerator based on CORDIC rotation,in which twiddle direction prediction is presented to reduce hardware cost and twiddle angles are generated in real time to save memory.To finish CORDIC rotation efficiently,a novel approach in which segmentedparallel iteration and compress iteration based on CSA are presented and redundant CORDIC is used to reduce the latency of each iteration.To prove the efficiency of our FFT accelerator,four FFT accelerators are prototyped into a FPGA chip to perform a batch-FFT.Experimental results show that our structure,which is composed of four butterfly units and finishes FFT with the size ranging from 64 to 8192 points,occupies 33230(3％) REGs and 143006(30％)LUTs.The clock frequency can reach 122MHz.The resources of double-precision FFT is only about 2.5 times of single-precision while the theoretical value is 4.What's more,only 13331 cycles are required to implement 8192-points double-precision FFT with four butterfly units in parallel. 相似文献

4.

基于CORDIC算法的流水线型FFT处理器设计

李靖宇《电视技术》2012,36(23):61-64,145

首先分析了基二FFT算法的原理以及在FPGA上实现FFT处理器的硬件结构。其次详细研究了在FPGA上实现FFT的具体过程,利用CORDIC算法实现了旋转因子乘法器,解决了整体设计过程中主要面对的几个关键问题,最终利用Verilog编程实现了基二流水线型FFT处理器,利用MATLAB与MODELSIM结合仿真结果表明该设计满足FFT处理器的基本要求,在10 MHz的采样率下完成32点FFT只需要14.45μs,设计方法也简单易行,具有一定的推广价值。相似文献

5.

WFTA算法的FPGA设计与实现

魏鹏孙磊王华力《通信技术》2011,44(4):167-169

Winograd傅里叶变换算法（WFTA）利用旋转因子W的特性对其进行分解,能够把FFT运算中乘法次数降到最低,是一种高效且资源占用相对较少的FFT实现方法。以256点分解为两维16×16点的小数组WFTA进行运算为例介绍了大数组WFTA算法的FPGA设计与实现方案。仿真测试表明,所设计的256点FFT处理器,乘法器资源消耗仅为基-2FFT的1/2、基-4FFT的2/3,且在100 MHz主时钟频率下完成运算仅需5.8μs,满足FFT处理器的高速实时性要求。相似文献

6.

An area-efficient and low-power 64-point pipeline Fast Fourier Transform for OFDM applications

《Integration, the VLSI Journal》2017

In an orthogonal frequency division multiplexing (OFDM) based wireless systems, Fast Fourier Transform (FFT) is a critical block as it occupies large area and consumes more power. In this paper, we present an area-efficient and low power 16-bit word-width 64-point radix-2² and radix-2³ pipelined FFT architectures for an OFDM-based IEEE 802.11a wireless LAN baseband. The designs are derived from radix-2^k algorithm and adopt a Single-Path Delay Feedback (SDF) architecture for hardware implementation. To eliminate the complex multipliers and read-only memory (ROM) which is used for internal storage of twiddle factor coefficients, the proposed 64-point FFT employs a Canonical Signed Digit (CSD) complex constant multiplier using adders, multiplexers and shifters. The complex constant multiplier (CCM) is modified using common sub-expression sharing block that reduces the area of the design. The proposed radix-2² and radix-2³ pipelined FFT architectures are modeled and implemented using TSMC 180 nm CMOS technology with a supply voltage of 1.8 V. The implementation results show that the proposed architectures significantly reduces the hardware cost and power consumption in comparison to existing 64-point FFT architectures. 相似文献

7.

Parallel Memory Accessing for FFT Architectures

V. Kitsakis K. Nakos D. Reisis N. Vlassopoulos 《Journal of Signal Processing Systems》2018,90(11):1593-1607

The current paper introduces an efficient technique for parallel data addressing in FFT architectures performing in-place computations. The novel addressing organization provides parallel load and store of the data involved in radix-r butterfly computations and leads to an efficient architecture when r is a power of 2. The addressing scheme is based on a permutation of the FFT data, which leads to the improvement of the address generating circuit and the butterfly processor control. Moreover, the proposed technique is suitable for mixed radix applications, especially for radixes that are powers of 2 and straightforward continuous flow implementation. The paper presents the technique and the resulting FFT architecture and shows the advantages of the architecture compared to hitherto published results. The implementations on a Xilinx FPGA Virtex-7 VC707 of the in-place radix-8 FFT architectures with input sizes 64 and 512 complex points validate the results. 相似文献

8.

Multi-mode parallel and folded VLSI architectures for 1D-fast Fourier transform

《Integration, the VLSI Journal》2016

The modern real time applications like orthogonal frequency division multiplexing and etc., demand high performance fast Fourier transform (FFT) design with less area and clock cycles. This paper proposes efficient FFT VLSI architectures using folded/parallel implementation. In the proposed folded FFT architecture, the number of cycles required to complete the operation is less than single path delay feedback (SDF)/multi-path delay commutator (MDC) architectures. In the proposed parallel FFT architecture, N-point FFT is implemented by using one N/2-point FFT without much extra hardware. Both the proposed architectures are implemented for radix-2, 2², and 4 using 45 nm technology library. The proposed parallel architecture achieves 56.7% and 40.6% of area reduction as compared with the existing parallel architecture based 16-point radix-2 and radix-2² DIF FFTs respectively. The proposed folded architecture achieves 65.5%, 51.1%, and 35.8% of worst path delay reduction as compared with the existing SDF based 16-point radix-2, radix-2², and radix-4 DIF FFTs respectively. 相似文献

9.

Reduced Memory and Low Power Architectures for CORDIC-based FFT Processors

Erdal Oruklu Xin Xiao Jafar Saniie 《Journal of Signal Processing Systems》2012,66(2):129-134

This paper presents a pipelined, reduced memory and low power CORDIC-based architecture for fast Fourier transform implementation. The proposed algorithm utilizes a new addressing scheme and the associated angle generator logic in order to remove any ROM usage for storing twiddle factors. As a case study, the radix-2 and radix-4 FFT algorithms have been implemented on FPGA hardware. The synthesis results match the theoretical analysis and it can be observed that more than 20% reduction can be achieved in total memory logic. In addition, the dynamic power consumption can be reduced by as much as 15% by reducing memory accesses. 相似文献

10.

基于FPGA的混合基FFT算法设计与实现

下载免费PDF全文

侯晓晨孟骁陈昊《太赫兹科学与电子信息学报》2021,19(2):303-307

目前,研究资源节约型的低复杂度混合基快速傅里叶变换(FFT)设计技术具有重要的应用价值.本文基于现场可编程逻辑门阵列(FPGA)平台提出并实现了一种新型混合基FFT分解算法.该算法基于原位存储结构设计,采用素数因子分解与库利-图基分解相结合的混合分解模式,在省去了一步旋转因子乘法运算的同时也有效减小了存储空间和运算量,... 相似文献

11.

Balanced Binary-Tree Decomposition for Area-Efficient Pipelined FFT Processing

Lee H.-Y. Park I.-C. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(4):889-900

This paper presents an area-efficient algorithm for the pipelined processing of fast Fourier transform (FFT). The proposed algorithm is to decompose a discrete Fourier transform (DFT) into two balanced sub-DFTs in order to minimize the total number of twiddle factors to be stored into tables. The radix in the proposed decomposition is adaptively changed according to the remaining transform length to make the transform lengths of sub-DFTs resulting from the decomposition as close as possible. An 8192-point pipelined FFT processor designed for digital video broadcasting-terrestrial (DVB-T) systems saves 33% of general multipliers and 23% of the total size of twiddle factor tables compared to a conventional pipelined FFT processor based on the radix-2² algorithm. In addition to the decomposition, several implementation techniques are proposed to reduce area, such as a simple index generator of twiddle factor and add/subtract units combined with the two's complement operation 相似文献

12.

A new radix-2/8 FFT algorithm for length-q/spl times/2/sup m/ DFTs

Bouguezel S. Ahmad M.O. Swamy M.N.S. 《IEEE transactions on circuits and systems. I, Regular papers》2004,51(9):1723-1732

In this paper, a new radix-2/8 fast Fourier transform (FFT) algorithm is proposed for computing the discrete Fourier transform of an arbitrary length N=q/spl times/2/sup m/, where q is an odd integer. It reduces substantially the operations such as data transfer, address generation, and twiddle factor evaluation or access to the lookup table, which contribute significantly to the execution time of FFT algorithms. It is shown that the arithmetic complexity (multiplications+additions) of the proposed algorithm is, in most cases, the same as that of the existing split-radix FFT algorithm. The basic idea behind the proposed algorithm is the use of a mixture of radix-2 and radix-8 index maps. The algorithm is expressed in a simple matrix form, thereby facilitating an easy implementation of the algorithm, and allowing for an extension to the multidimensional case. For the structural complexity, the important properties of the Cooley-Tukey approach such as the use of the butterfly scheme and in-place computation are preserved by the proposed algorithm. 相似文献

13.

一款基于MVR-CORDIC的高速64点基-4FFT处理器

侯卫华郭晖刘明峰于宗光《电子与封装》2008,8(5):22-25

文中设计了一款64点基-4FFT处理器,用改进的CORDIC （MVR-CORDIC）处理单元代替常规FFT处理器中的复数乘法器,改进的CORDIC处理单元在保证SQNR性能下,仅用极少次数的移位加法运算即可完成一次复数乘法,缩减了完成一次基本蝶形运算的时间并减小了面积开销。该FFT处理器结构采用两块独立的RAM,并对中间数据作“乒-乓”式存储操作以节省数据存储时间,从而提高完成一次FFT运算的速度。所设计的FFT处理器通过FPGA进行验证,结果表明平均完成一次64点FFT运算仅需要不到1μs。相似文献

14.

IS-CORDIC: a fixed-point inverse recoded single iteration CORDIC architecture

Hamid Mehmood Allah Ditta Kamboh Shoab Ahmed Khan 《International Journal of Electronics》2013,100(6):789-807

This paper presents a novel modified Coordinate Rotation Digital Computer (CORDIC) architecture that computes values of sine and cosine in a single cycle. The proposed method utilises angle-recoding technique to design a modified CORDIC algorithm. Multiple iterations are merged in the modified algorithm using memory storage for initial iterations and employing inverse recoding to generate constant multiplication factors for the remaining iterations. Scale factor of the algorithm remains constant, as these factors are independent of intermediate directions of rotation. In addition, the architecture is mapped onto a single CORDIC computation element that requires only a single cycle to compute the result. These multiplications are implemented using dedicated hardware multipliers in Field Programmable Gate Arrays and customised fixed-point multiplication techniques for Application Specific Integrated Circuits. Implementation results show that the proposed IS-CORDIC architecture is 7.9 times more efficient than basic CORDIC and has reduced area-delay product than current state of the art implementations. 相似文献

15.

Hardware architecture for an anti-traffic noise system

《Microelectronics Journal》2015,46(5):370-376

This work presents an energy efficient architecture for an anti-traffic noise system. The hardware is designed for a road side unit (RSU) in intelligent transportation systems. Fast Fourier Transform is the cornerstone for the suggested system. An ultra low power architecture for the FFT suitable for FPGA implementation is derived. Bit-widths for both data and twiddle factors are optimized for low-power. The architecture uses an efficient complex multiplier that has 25% less multiplications. An algorithm to compute the number of time-shared butterflies for a given FFT block size and a target throughput is elaborated. Finally synthesis results using fixed-point VHDL library and commercial IP are presented and compared with the proposed FFT processor. 相似文献

16.

OFDM系统中高速FFT处理器的FPGA实现 总被引：1，自引：0，他引：1

顾晴茹周玉洁《信息技术》2005,29(12):70-73

针对OFDM系统中FFT处理器的设计要求,选择并具体分析FFT基4-DIF算法流程,并利用现场可编程设计开发了高速FFT信号处理器。本设计采用Verilog HDL语言进行描述,并通过了仿真和验证。相似文献

17.

OFDM调制中高速FFT处理器设计的新方法

许婵媛罗汉文宋文涛《通信技术》2003,(5):15-17

在宽带OFDM系统的实现中,FFT处理器是一个关键部分。通过对传统分裂基结构的改进,提出了适用于OFDM系统的FFT处理器的新方案。在方案中采用流水方式保证系统的速度,在计算、通信和存储间取得平衡,使取数据、计算旋转因子、复乘、DFT等操作协调一致,避免了瓶颈的出现。并且与以往提出的FFT处理器的方案进行比较,证明这种新方案采用了较少的乘法器数目以及较少的存储单元,提高了器件利用率。相似文献

18.

Computationally Efficient Architecture for Accurate Frequency Estimation with Fourier Interpolation

Dongpei Liu Hengzhu Liu Li Zhou Jianfeng Zhang Botao Zhang 《Circuits, Systems, and Signal Processing》2014,33(3):781-797

A simplified DFT-based algorithm and its VLSI implementation for accurate frequency estimation of single-tone complex sinusoid signal are investigated. The proposed algorithm estimates frequency by interpolation using Fourier coefficients. It consists of a coarse search followed by a fine search, and its performance closely achieves the Cramer–Rao low bound (CRLB) even in low SNR region. Moreover, a pipelined triple-mode CORDIC architecture is designed to efficiently support complex multiplication, complex magnitude calculation and real division. The triple-mode CORDIC-based radix-4 architecture is employed for the hardware implementation of the frequency estimator, and is suitable for not only fast Fourier transformation but also accurate frequency estimation. A frequency estimator with 1024-point samples is implemented and verified on FPGA. It works at 215 MHz on a Xilinx XC6VLX240T FPGA device, and uses up 4,161 registers and 6,986 slice LUTs. ASIC synthesis results show that it requires an area of 60K equivalent NAND2 gates with a clock rate of 500 MHz at SMIC 0.18 μm technology. The whole latency of the frequency estimator is 2336 cycles. The proposed architecture provides a good trade off between hardware overhead, estimation performance and computation latency. 相似文献

19.

Radix-4 Vectoring CORDIC Algorithm and Architectures

J. Villalba E.L. Zapata E. Antelo J.D. Bruguera 《The Journal of VLSI Signal Processing》1998,19(2):127-147

In this work we extend the radix-4 CORDIC algorithm to the vectoring mode (the radix-4 CORDIC algorithm was proposed recently by the authors for the rotation mode). The extension to the vectoring mode is not straightforward, since the digit selection function is more complex in the vectoring case than in the rotation case; as in the rotation mode, the scale factor is not constant. Although the radix-4 CORDIC algorithm in vectoring mode has a similar recurrence as the radix-4 division algorithm, there are specific issues concerning the vectoring algorithm that demand dedicated study. We present the digit selection for nonredundant and redundant arithmetic (following two different approaches: arithmetic comparisons and table look-up), the computation and compensation of the scale factor, and the implementation of the algorithm (with both types of digit selection) in a word-serial architecture. When compared with conventional radix-2 (redundant and non-redundant) architectures, the radix-4 algorithms present a significant speed up for angle calculation. For the computation of the magnitude the speed up is very slight, due to the nonconstant scale factor in the radix-4 algorithm. 相似文献

20.

一种用于公钥密码系统的新型可变Radix快速乘法硬件算法

盖伟新《电子学报》1995,23(11):77-80

本文提出了一种新型的可变ｒａｄｉｘ快速乘法硬件算法，算法中，采用了二进制数的冗余数表示方法，使二个大数（大到５１２ｂｉｔ位或更大）的相加在Ｏ（１）时间内完成而无需等待进位；其次，提出了可变ｒａｄｉｘ快速乘法思想，使算法比ｒａｄｉｘ－４的乘法算法速度提高３３％，比ｒａｄｉｘ－８的乘法算法速度提高１１％而硬件实现更为简单，算法还能克服在较坏和最坏条件下，ｒａｄｉｘ－８乘法算法速度严重下降的缺陷，是一种可以作为核心运算有效地使用在许多公钥密码体制（如ＲＳＡ）硬件ＶＬＳＩ实现中的新型快速算法。相似文献