期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

FPGA implementation of high-performance,resource-efficient Radix-16 CORDIC rotator based FFT algorithm

《Integration, the VLSI Journal》2020

The fast Fourier transform (FFT) is an algorithm widely used to compute the discrete Fourier transform (DFT) in real-time digital signal processing. High-performance with fewer resources is highly desirable for any real-time application. Our proposed work presents the implementation of the radix-2 decimation-in-frequency (R2DIF) FFT algorithm based on the modified feed-forward double-path delay commutator (DDC) architecture on FPGA device. Need for a complex multiplier to carry out the multiplication of complex twiddle factors and large memory to store the twiddle factors are the main concerns for FFT implementation. Propose work aims to address these issues. In this work, a high-performance radix-16 COordinate Rotational DIgital Computer (CORDIC) algorithm based rotator is proposed to carry out the complex twiddle factor multiplication. Further, CORDIC needs only rotational angles to carry out complex multiplication, which reduces the need for large memory to store the twiddle factors. To compute the total rotation for n-bit precision, our proposed radix-16 CORDIC algorithm takes n/4 iteration as compared to n iteration of the radix-2 CORDIC algorithm. Our proposed architecture of the radix-2 decimation-in-frequency (R2DIF) algorithm is implemented on a Virtex−7 series FPGA. Further, the detailed comparison is presented between our proposed FFT implementation and other recently proposed FFT implementations. Experimental results suggest that proposed implementation has less latency and hardware utilization as compared to recently proposed implementations. 相似文献

2.

Radix-4 Vectoring CORDIC Algorithm and Architectures

J. Villalba E.L. Zapata E. Antelo J.D. Bruguera 《The Journal of VLSI Signal Processing》1998,19(2):127-147

In this work we extend the radix-4 CORDIC algorithm to the vectoring mode (the radix-4 CORDIC algorithm was proposed recently by the authors for the rotation mode). The extension to the vectoring mode is not straightforward, since the digit selection function is more complex in the vectoring case than in the rotation case; as in the rotation mode, the scale factor is not constant. Although the radix-4 CORDIC algorithm in vectoring mode has a similar recurrence as the radix-4 division algorithm, there are specific issues concerning the vectoring algorithm that demand dedicated study. We present the digit selection for nonredundant and redundant arithmetic (following two different approaches: arithmetic comparisons and table look-up), the computation and compensation of the scale factor, and the implementation of the algorithm (with both types of digit selection) in a word-serial architecture. When compared with conventional radix-2 (redundant and non-redundant) architectures, the radix-4 algorithms present a significant speed up for angle calculation. For the computation of the magnitude the speed up is very slight, due to the nonconstant scale factor in the radix-4 algorithm. 相似文献

3.

长度为2^n的FFT运算基实现

张思为胡剑浩《中国集成电路》2010,19(6):53-57,75

大点数FFT运算是数字信号处理中关键技术环节,本文提出一种大点数FFT运算基的实现,该实现是根据[1]中所提出的算法,结合寄存器阵列模块和重排序模块,实现FFT运算基模块内部的数据传输和模式切换,以基4与基2为模块中的基本运算单元构成大点数的FFT运算基,在控制电路配合下实现快速傅里叶变换。该实现通过面向寄存器级的Simulink仿真模型,验证本文所设计模块功能的正确性和可行性,为基于大点数的FFT运算指出了一种实现方法。相似文献

4.

快速图像匹配相关系数算法及实现 总被引：1，自引：0，他引：1

刘红侠杨靓黄巾黄士坦《微电子学与计算机》2007,24(2):32-35

最大归一互相关图像匹配算法是图像匹配中的常用算法,其关键是解算活动图与基准图间的相关系数。针对相关系数计算量大的特点,分析了FFT的基与FFT处理速度之间的关系以及基16FFT算法特点,提出用基16FFT算法计算相关系数,相关系数的处理时间大幅减小;同时针对高基蝶形单元设计复杂、使用不灵活等特点,提出采用级连思想实现主基16蝶形单元,使处理器的设计复杂度降低。实验证明,将主基16FFT处理器用于相关系数的计算中,使最大归一互相关图像匹配处理速度达到国际领先水平。相似文献

5.

免缩放因子双步旋转CORDIC算法 总被引：7，自引：0，他引：7

下载免费PDF全文

徐成秦云川李肯立戚芳芳《电子学报》2014,42(7):1441-1445

集成电路设计中经常使用CORDIC算法实现高效的向量旋转操作.当前对该算法的研究热点集中在减少该算法的迭代次数、扩展其收敛范围以及降低缩放因子补偿操作的代价等问题上.本文提出免缩放因子的双步旋转CORDIC算法使用双步旋转策略,减少了免缩放因子CORDIC算法的迭代次数,将收敛区间扩展到了整个圆周区间.实验结果表明,该算法保持高计算精度的同时减少了迭代次数和面积消耗. 相似文献

6.

Leading One Detection Hyperbolic CORDIC with Enhanced Range of Convergence

Supriya Aggarwal Kavita Khare 《Journal of Signal Processing Systems》2013,70(1):49-57

This paper focuses on developing an area efficient hyperbolic Coordinate Rotation Digital Computer (CORDIC) algorithm with performance improvement. The algorithm eliminates the need of scale factor calculation in the Range of Convergence (ROC). At the same time the range of convergence offered is higher than the conventional CORDIC ROC in the hyperbolic rotation mode. Being the only kind of algorithm in hyperbolic rotation with sign sequence μ?=?1 always, one complete operation requires just 5 iterations. Thus the pipelined implementation has 5 stages which provides a 50% increase in throughput in comparison to conventional CORDIC. As far as the area improvement is considered, 16-bit processor can be realized using 56% less number of full adders required by Flat-CORDIC. The x and y datapath are based on series expansion of hyperbolic functions. The complete algorithm design along with pipelined architecture implementation is detailed. 相似文献

7.

An area-efficient and low-power 64-point pipeline Fast Fourier Transform for OFDM applications

《Integration, the VLSI Journal》2017

In an orthogonal frequency division multiplexing (OFDM) based wireless systems, Fast Fourier Transform (FFT) is a critical block as it occupies large area and consumes more power. In this paper, we present an area-efficient and low power 16-bit word-width 64-point radix-2² and radix-2³ pipelined FFT architectures for an OFDM-based IEEE 802.11a wireless LAN baseband. The designs are derived from radix-2^k algorithm and adopt a Single-Path Delay Feedback (SDF) architecture for hardware implementation. To eliminate the complex multipliers and read-only memory (ROM) which is used for internal storage of twiddle factor coefficients, the proposed 64-point FFT employs a Canonical Signed Digit (CSD) complex constant multiplier using adders, multiplexers and shifters. The complex constant multiplier (CCM) is modified using common sub-expression sharing block that reduces the area of the design. The proposed radix-2² and radix-2³ pipelined FFT architectures are modeled and implemented using TSMC 180 nm CMOS technology with a supply voltage of 1.8 V. The implementation results show that the proposed architectures significantly reduces the hardware cost and power consumption in comparison to existing 64-point FFT architectures. 相似文献

8.

Pipelining flat CORDIC based trigonometric function generators

《Microelectronics Journal》2002,33(1-2):77-89

Despite further refinements of the CORDIC algorithm with the introduction of redundant arithmetic and higher radix CORDIC techniques, in terms of circuit latency and performance, the iterative nature remains to be the major bottleneck for further optimization. A technique known as flat CORDIC, in which the conventional X and Y recurrences are successively substituted to express the final vectors in terms of the initial vectors, can be used to eliminate the iterative process. In this paper, the techniques devised for the VLSI efficient implementation of a pipelined 16-bit flat CORDIC based sine–cosine generator are presented. Three possible schemes to pipeline the 16-bit flat CORDIC design have been presented to demonstrate the suitability of the proposed method to realize high throughput implementations. The 16-bit architecture has been synthesized with 0.35 μ CMOS process library using Synopsys. Finally, a detailed comparison with other major contributions show that the flat CORDIC based sine–cosine generators are, on average, 30% faster and occupy some 30% less silicon area. 相似文献

9.

Very-High Radix CORDIC Rotation Based on Selection by Rounding

Elisardo Antelo Tomás Lang Javier D. Bruguera 《The Journal of VLSI Signal Processing》2000,25(2):141-153

A very-high radix algorithm and implementation for CORDIC rotation in circular and hyperbolic coordinates is presented. The selection function consists of rounding the residual. It is shown that this assures convergence from the second iteration on. For the first iteration, the selection is done by table, using a lower radix than for the remaining iterations. The compensation of the variable scale factor is done by computing the logarithm of the scale factor and performing the compensation by an exponential. Estimations of the delay for 32-bit and 64-bit precision show a substantial speed up when compared to low radix implementations. The proposed algorithm is also compared with previously proposed very-high radix ones, and significant advantages are identified. 相似文献

10.

Multi-mode parallel and folded VLSI architectures for 1D-fast Fourier transform

《Integration, the VLSI Journal》2016

The modern real time applications like orthogonal frequency division multiplexing and etc., demand high performance fast Fourier transform (FFT) design with less area and clock cycles. This paper proposes efficient FFT VLSI architectures using folded/parallel implementation. In the proposed folded FFT architecture, the number of cycles required to complete the operation is less than single path delay feedback (SDF)/multi-path delay commutator (MDC) architectures. In the proposed parallel FFT architecture, N-point FFT is implemented by using one N/2-point FFT without much extra hardware. Both the proposed architectures are implemented for radix-2, 2², and 4 using 45 nm technology library. The proposed parallel architecture achieves 56.7% and 40.6% of area reduction as compared with the existing parallel architecture based 16-point radix-2 and radix-2² DIF FFTs respectively. The proposed folded architecture achieves 65.5%, 51.1%, and 35.8% of worst path delay reduction as compared with the existing SDF based 16-point radix-2, radix-2², and radix-4 DIF FFTs respectively. 相似文献

11.

High-Throughput VLSI Architecture for FFT Computation

Chao Cheng Parhi K.K. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2007,54(10):863-867

In this brief, multi-path delay commutator structures are utilized to improve the throughput rate of radix-2 and radix-4 FFT computation by a factor of 2 to 4. Latency can also be reduced by a factor of 2 to 3. Compared with previous radix-2 and radix-4 FFT structures, the proposed high-throughput FFT with doubled throughput rate requires similar or even less hardware cost. Although split radix FFT design is more hardware efficient, the regular structure of proposed FFT structures are attractive for high throughput FFT design. [All rights reserved Elsevier]. 相似文献

12.

Balanced Binary-Tree Decomposition for Area-Efficient Pipelined FFT Processing

Lee H.-Y. Park I.-C. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(4):889-900

This paper presents an area-efficient algorithm for the pipelined processing of fast Fourier transform (FFT). The proposed algorithm is to decompose a discrete Fourier transform (DFT) into two balanced sub-DFTs in order to minimize the total number of twiddle factors to be stored into tables. The radix in the proposed decomposition is adaptively changed according to the remaining transform length to make the transform lengths of sub-DFTs resulting from the decomposition as close as possible. An 8192-point pipelined FFT processor designed for digital video broadcasting-terrestrial (DVB-T) systems saves 33% of general multipliers and 23% of the total size of twiddle factor tables compared to a conventional pipelined FFT processor based on the radix-2² algorithm. In addition to the decomposition, several implementation techniques are proposed to reduce area, such as a simple index generator of twiddle factor and add/subtract units combined with the two's complement operation 相似文献

13.

CORDIC based fast algorithm for power-of-two point DCT and its efficient VLSI implementation

《Microelectronics Journal》2014,45(11):1480-1488

—In this paper, we present a coordinate rotation digital computer (CORDIC) based fast algorithm for power-of-two point DCT, and develop its corresponding efficient VLSI implementation. The proposed algorithm has some distinguish advantages, such as regular Cooley-Tukey FFT-like data flow, identical post-scaling factor, and arithmetic-sequence rotation angles. By using the trigonometric formula, the number of the CORDIC types is reduced dramatically. This leads to an efficient method for overcoming the problem that lack synchronization among the various rotation angles CORDICs. By fully reusing the uniform processing cell (PE), for 8-point DCT, only four carry save adders (CSAs)-based PEs with two different types are required. Compared with other known architectures, the proposed 8-point DCT architecture has higher modularity, lower hardware complexity, higher throughput and better synchronization. 相似文献

14.

Configurable Floating-Point FFT Accelerator on FPGA Based Multiple-Rotation CORDIC

《电子学报:英文版》2016,(6):1063-1070

Fast Fourier transform (FFT) accelerator and Coordinate rotation digital computer (CORDIC) algorithm play important roles in signal processing.We propose a conflgurable floating-point FFT accelerator based on CORDIC rotation,in which twiddle direction prediction is presented to reduce hardware cost and twiddle angles are generated in real time to save memory.To finish CORDIC rotation efficiently,a novel approach in which segmentedparallel iteration and compress iteration based on CSA are presented and redundant CORDIC is used to reduce the latency of each iteration.To prove the efficiency of our FFT accelerator,four FFT accelerators are prototyped into a FPGA chip to perform a batch-FFT.Experimental results show that our structure,which is composed of four butterfly units and finishes FFT with the size ranging from 64 to 8192 points,occupies 33230(3％) REGs and 143006(30％)LUTs.The clock frequency can reach 122MHz.The resources of double-precision FFT is only about 2.5 times of single-precision while the theoretical value is 4.What's more,only 13331 cycles are required to implement 8192-points double-precision FFT with four butterfly units in parallel. 相似文献

15.

Cost-Effective Triple-Mode Reconfigurable Pipeline FFT/IFFT/2-D DCT Processor

Chin-Teng Lin Yuan-Chu Yu Lan-Da Van 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(8):1058-1071

This investigation proposes a novel radix-4² algorithm with the low computational complexity of a radix-16 algorithm but the lower hardware requirement of a radix-4 algorithm. The proposed pipeline radix-4² single delay feedback path (R4²SDF) architecture adopts a multiplierless radix-4 butterfly structure, based on the specific linear mapping of common factor algorithm (CFA), to support both 256-point fast Fourier transform/inverse fast Fourier transform (FFT/IFFT) and 8times8 2D discrete cosine transform (DCT) modes following with the high efficient feedback shift registers architecture. The segment shift register (SSR) and overturn shift register (OSR) structure are adopted to minimize the register cost for the input re-ordering and post computation operations in the 8times8 2D DCT mode, respectively. Moreover, the retrenched constant multiplier and eight-folded complex multiplier structures are adopted to decrease the multiplier cost and the coefficient ROM size with the complex conjugate symmetry rule and subexpression elimination technology. To further decrease the chip cost, a finite wordlength analysis is provided to indicate that the proposed architecture only requires a 13-bit internal wordlength to achieve 40-dB signal-to-noise ratio (SNR) performance in 256-point FFT/IFFT modes and high digital video (DV) compression quality in 8 times 8 2D DCT mode. The comprehensive comparison results indicate that the proposed cost effective reconfigurable design has the smallest hardware requirement and largest hardware utilization among the tested architectures for the FFT/IFFT computation, and thus has the highest cost efficiency. The derivation and chip implementation results show that the proposed pipeline 256-point FFT/IFFT/2D DCT triple-mode chip consumes 22.37 mW at 100 MHz at 1.2-V supply voltage in TSMC 0.13-mum CMOS process, which is very appropriate for the RSoCs IP of next-generation handheld devices. 相似文献

16.

Design, Implementation and Analysis of a New Redundant CORDIC Processor with Constant Scaling Factor and Regular Structure 总被引：1，自引：0，他引：1

Shen-Fu Hsiao Jen-Yin Chen 《The Journal of VLSI Signal Processing》1998,20(3):267-278

A new high-speed redundant CORDIC processor is designed and implemented based on the double rotation method, which turns out to be the two-dimensional (2D) Householder CORDIC, a special case of the generalized Householder CORDIC in the 2D Euclidean vector space. The new processor has the advantages of regular structure and high throughput rate. The pipelined structure with radix-2 signed-digit (SD) redundant arithmetic is adopted to reduce the carry-propagation delay of the adders while the digit-serial structure alleviates the burden of the hardware cost and I/O requirement. Compared to previously proposed designs, the new CORDIC processor preserves the constant scaling factor, an important merit of the original CORDIC, and thus does not require any complicated division or square-root operations for variable scaling factor calculation. Furthermore, the processor is well suited to VLSI implementation since it does not call for any irregularly inserted correcting iterations. Both angle calculation mode for computing trigonometric function and vector rotation mode for plane rotations are supported. Practical VLSI chip implementation of the fixed-point redundant CORDIC processor using 0.6 m standard cell library is given including detailed numerical error analysis. 相似文献

17.

A new radix-2/8 FFT algorithm for length-q/spl times/2/sup m/ DFTs

Bouguezel S. Ahmad M.O. Swamy M.N.S. 《IEEE transactions on circuits and systems. I, Regular papers》2004,51(9):1723-1732

In this paper, a new radix-2/8 fast Fourier transform (FFT) algorithm is proposed for computing the discrete Fourier transform of an arbitrary length N=q/spl times/2/sup m/, where q is an odd integer. It reduces substantially the operations such as data transfer, address generation, and twiddle factor evaluation or access to the lookup table, which contribute significantly to the execution time of FFT algorithms. It is shown that the arithmetic complexity (multiplications+additions) of the proposed algorithm is, in most cases, the same as that of the existing split-radix FFT algorithm. The basic idea behind the proposed algorithm is the use of a mixture of radix-2 and radix-8 index maps. The algorithm is expressed in a simple matrix form, thereby facilitating an easy implementation of the algorithm, and allowing for an extension to the multidimensional case. For the structural complexity, the important properties of the Cooley-Tukey approach such as the use of the butterfly scheme and in-place computation are preserved by the proposed algorithm. 相似文献

18.

一种用于公钥密码系统的新型可变Radix快速乘法硬件算法

盖伟新《电子学报》1995,23(11):77-80

本文提出了一种新型的可变ｒａｄｉｘ快速乘法硬件算法，算法中，采用了二进制数的冗余数表示方法，使二个大数（大到５１２ｂｉｔ位或更大）的相加在Ｏ（１）时间内完成而无需等待进位；其次，提出了可变ｒａｄｉｘ快速乘法思想，使算法比ｒａｄｉｘ－４的乘法算法速度提高３３％，比ｒａｄｉｘ－８的乘法算法速度提高１１％而硬件实现更为简单，算法还能克服在较坏和最坏条件下，ｒａｄｉｘ－８乘法算法速度严重下降的缺陷，是一种可以作为核心运算有效地使用在许多公钥密码体制（如ＲＳＡ）硬件ＶＬＳＩ实现中的新型快速算法。相似文献

19.

Parallel Memory Accessing for FFT Architectures

V. Kitsakis K. Nakos D. Reisis N. Vlassopoulos 《Journal of Signal Processing Systems》2018,90(11):1593-1607

The current paper introduces an efficient technique for parallel data addressing in FFT architectures performing in-place computations. The novel addressing organization provides parallel load and store of the data involved in radix-r butterfly computations and leads to an efficient architecture when r is a power of 2. The addressing scheme is based on a permutation of the FFT data, which leads to the improvement of the address generating circuit and the butterfly processor control. Moreover, the proposed technique is suitable for mixed radix applications, especially for radixes that are powers of 2 and straightforward continuous flow implementation. The paper presents the technique and the resulting FFT architecture and shows the advantages of the architecture compared to hitherto published results. The implementations on a Xilinx FPGA Virtex-7 VC707 of the in-place radix-8 FFT architectures with input sizes 64 and 512 complex points validate the results. 相似文献

20.

New radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) and radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) DIF FFT algorithms for 3-D DFT

Bouguezel S. Ahmad M.O. Swamy M.N.S. 《IEEE transactions on circuits and systems. I, Regular papers》2006,53(2):306-315

In this paper, new three-dimensional (3-D) radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) and radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) decimation-in-frequency (DIF) fast Fourier transform (FFT) algorithms are developed and their implementation schemes discussed. The algorithms are developed by introducing the radix-2/4 and radix-2/8 approaches in the computation of the 3-D DFT using the Kronecker product and appropriate index mappings. The butterflies of the proposed algorithms are characterized by simple closed-form expressions facilitating easy software or hardware implementations of the algorithms. Comparisons between the proposed algorithms and the existing 3-D radix-(2/spl times/2/spl times/2) FFT algorithm are carried out showing that significant savings in terms of the number of arithmetic operations, data transfers, and twiddle factor evaluations or accesses to the lookup table can be achieved using the radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) DIF FFT algorithm over the radix-(2/spl times/2/spl times/2) FFT algorithm. It is also established that further savings can be achieved by using the radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) DIF FFT algorithm. 相似文献