共查询到20条相似文献,搜索用时 21 毫秒
1.
2.
基2与混合基快速Fourier变换算法性能比较 总被引:1,自引:0,他引:1
目前快速Fourier变换算法主要有两大类,一类是针对点数为2的整数次幂,一类对应点数为其他长度的情况。在介绍基2和混合基的FFT算法原理的基础上,通过仿真数据对两种FFT算法的性能进行了比较分析。验证结果表明,基2算法在计算速度方面要占有优势,但在整周期截断的情况下,混合基快速算法却在频谱效果方面占有优势。 相似文献
3.
管道腐蚀内检测中超声回波信号具有周期性特点,功率谱估计是重要的数据处理方法之一。基于分裂基的FFT算法具有较小的乘法次数和加法次数,且算法结构较好。采用频率抽取分裂基2/4 FFT算法对管道腐蚀超声内检测回波信号进行了处理.得到管道壁厚数据,经分裂基FFT算法和基2 FFT算法比较,分裂基FFT算法明显减少了数据处理时间,提高了检测速度。理论分析和实验结果表明,该分裂基算法精度高,数据处理速度快,满足管道腐蚀内检测的实时性要求。 相似文献
4.
Some implementations of a power-of-two one-dimensional fast Fourier transform (FFT) on vector computers use radix-4 Stockham autosort kernels with a separate transpose step. This paper describes an algorithm that performs well on a Convex C4/XA vector supercomputer on large FFTs by using higher-radix kernels and moving the transpose step into the computational steps. For short transforms a different algorithm is used that calculates the FFT without storing any intermediate results to memory. Performance results using these techniques are given. 相似文献
5.
Dr. K. Dobeš 《Computing》1982,29(3):263-276
New recursive formulae for trigonometric functions generation suitable for FFT algorithms are given. Evaluation of one pair of trigonometric coefficients thus requires 2 multiplications and 2 additions only. Speed comparisons of various radices 2, 4 and 8 FFT FORTRAN realizations were performed. An efficient FORTRAN IV program for one-dimensional complex FFT based on radix 8 algorithm with recursively generated trigonometric coefficients is supplied. 相似文献
6.
7.
8.
David A. Carlson 《The Journal of supercomputing》1992,6(2):107-116
In this paper a set of techniques for improving the performance of the fast Fourier transform (FFT) algorithm on modern vector-oriented supercomputers is presented. Single-processor FFT implementations based on these techniques are developed for the CRAY-2 and the CRAY Y-MP, and it is shown that they achieve higher performance than previously measured on these machines. The techniques include (1) using gather/scatter operations to maintain optimum length vectors throughout all stages of small-to medium-sized FFTs, (2) using efficient radix-8 and radix-16 inner loops, which allow a large number of vector loads/stores to be overlapped, and (3) prefetching twiddle factors as vectors so that on the CRAY-2 they can later be fetched from local memory in parallel with common memory accesses. Performance results for Fortran implementations using these techniques demonstrate that they are faster than Cray's library FFT routine CFFT2. The actual speedups obtained, which depend on the size of the FFT being computed and the supercomputer being used, range from about 5 to over 300%. 相似文献
9.
针对地面数字视频广播(DVB-T)系统中高速FFT处理器的设计要求,提出了一种新的基16/8混合基算法及其实现结构。采用单个基16/8复用的蝶形运算单元顺序处理,并通过减少乘法器数目,有效降低了硬件消耗;运算单元内部采用“基4+基4/2”级联流水线方式,大大加快了运算速度;此外,应用对称乒乓RAM结构提高了蝶算单元的连续运算能力;并且使用改进的块浮点防溢出机制,以保证运算精度。仿真和实现结果表明该设计具有良好的性能,完全满足实际应用要求。 相似文献
10.
By introducing a form of reorder for multidimensional data, we propose a unified fast algo-rithm that jointly employs one-dimensional W transform and multidimensional discrete polynomial trans-form to compute eleven types of multidimensional discrete orthogonal transforms, which contain three types of m-dimensional discrete cosine transforms ( m-D DCTs) ,four types of m-dimensional discrete W transforms ( m-D DWTs) ( m-dimensional Hartley transform as a special case), and four types of generalized discrete Fourier transforms ( m-D GDFTs). For real input, the number of multiplications for all eleven types of the m-D discrete orthogonal transforms needed by the proposed algorithm are only 1/m times that of the commonly used corresponding row-column methods, and for complex input, it is further reduced to 1/(2m) times. The number of additions required is also reduced considerably. Furthermore, the proposed algorithm has a simple computational structure and is also easy to be im-plemented on computer, and th 相似文献
11.
The CORDIC algorithm, originally proposed using nonredundant radix-2 arithmetic, has been refined in terms of throughput and latency with the introduction of redundant arithmetic and higher radix techniques. In this paper, we propose a pipelined architecture using signed digit arithmetic for the VLSI efficient implementation of rotational radix-4 CORDIC algorithm, eliminating z path completely. A detailed comparison of the proposed architecture with the available radix-2 architectures shows the latency and hardware improvement. The proposed architecture achieves latency improvement over the previously proposed radix-4 architecture with a relatively small hardware overhead. The proposed architecture for 16-bit precision was implemented using VHDL and extensive simulations have been performed to validate the results. The functionally simulated net list has been synthesized for 16-bit precision with 90 nm CMOS technology library and the area-time measures are provided. This architecture was also implemented using Xilinx ISE9.1 software and a Virtex device. 相似文献
12.
《IEEE transactions on pattern analysis and machine intelligence》1983,(4):512-521
A decimation-in-time radix-2 fast Fourier transform (FFT) algorithm is considered here for implementation in multiprocessors with shared bus, multistage interconnection network (MIN), and in mesh connected computers. Results are derived for data allocation, interprocessor communication, approximate computation time, and speedup of an N point FFT on any P available processing elements (PE's). Further generalization is obtained for a radix-r FFT algorithm. An N X N point two-dimensional discrete Fourier transform (DFT) implementation is also considered when one or more rows of the input data matrix are allocated to each PE. 相似文献
13.
针对无线城域网中工作在2GHz~11GHz频带的IEEE802.16a标准,在实现其OFDM系统时提出一种高速而且经济的FFT处理器设计方案。设计中采用了Radix-4的频率抽取算法和并行的蝶型计算单元结构,而且将旋转因子预先存储在ROM中以提高处理器运行的速度。设计方案采用了单个蝶型运算单元以达到控制FFT处理器规模的目的。数据的输入与输出都共用一个存储器,这进一步节约了硬件资源损耗。 相似文献
14.
An application-specific architecture for the parallel calculation of the decimation in time and radix 2 fast Hartley (FHT) and Fourier (FFT) transforms is presented. A real sequence with N =2n data items is considered as input. The system calculates the FHT and the FFT in n and n +1 stages. respectively. The modular and regular parallel architecture is based on a constant geometry algorithm using butterflies of four data items and the perfect unshuffle permutation. With this permutation, the mapping of the algorithm in VLSI technology is simplified and the communications among processors are minimized. Organization of the processor memory based on first-in, first-out (FIFO) queues facilitates a systolic data flow and permits the implementation in a direct way of the complex data movements and address sequences of the transforms. This is accomplished by means of simple multiplexing operations, using hardwired control. The total calculation time is (N log2N )/4Q cycles for the FHT and N (1+log2N )/4Q cycles for the FFT, where Q is the number of processors ( Q = 2q, Q ⩽N /4) 相似文献
15.
16.
We have proposed a reconfigurable high speed and very economical Rapid Single Flux Quantum (RSFQ) superconducting logic design based on the Fast Fourier Transform (FFT) Processor. We have designed a 256 – point FFT processor with the help of a bit-slicing block sharing unit. RSFQ is one of the superconducting device logics comprises of Josephson Junction. The computation complexity of this superconducting FFT is less when the number of points increased. We have proposed three different designs depending on the split radix FFT, the bit-serial radix 2 FFT, and the mixed radix FFT algorithms. The proposed design will slice the 256 – point FFT into eight 32 – point FFT each and each 32 – point FFT is divided into eight 4 – point FFT each for the reduction in hardware cost. For complex multiplication, the computation complexity of our design will be less than N/2 Log2 N for the radix 2 algorithm based on the Block share processing Unit (BSPU) and further, it is reduced for split radix & mixed radix algorithms based on BSPU based RSFQ logic. Due to this, the speed of the processor is improvised compared to general FFT algorithm based semiconductor technology. we have computed and calculated the latency at 10 GHz for our designs. The main aim of this proposed design is to reduce the complex computation time and better performance of the processor with less hardware cost. This proposed design can furthermore continue to several N2 – point by using synchronous clock tree. 相似文献
17.
18.
We present a new parallel radix-4 FFT algorithm based on the BSP model. Our parallel algorithm uses the group-cyclic distribution family, which makes it simple to understand and easy to implement. We show how to reduce the communication cost of the algorithm by a factor of 3, in the case that the input/output vector is in the cyclic distribution. We also show how to reduce computation time on computers with a cache-based architecture. We present performance results on a Cray T3E with up to 64 processors, obtaining reasonable efficiency levels for local problem sizes as small as 256 and very good efficiency levels for local sizes larger than 2048. 相似文献
19.
针对基-2 FFT 处理算法,采用分块存储思想,将存储器、处理机数据交换网络模型进行优化。优化后的网络模型数据通路数仅为20,降低为原来的4%以下,且不随 FFT 计算点数增多而增加。整个设计在 Virtex 系统芯片 XCV800上实现。 相似文献