期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

三维向量基快速傅立叶算法 总被引：1，自引：1，他引：0

吴云峰徐妮妮肖志涛侯正信《计算机应用》2009,29(2):618-621

给出了三维向量基快速傅立叶变换（3-D Vector Radix FFT）算法。对三维信号采用基2时域抽取,导出了该算法蝶形运算的一般形式。计算量比较结果显示,三维向量基FFT算法比基于行列分解的三维FFT算法计算量低,计算效率高。相似文献

2.

基2与混合基快速Fourier变换算法性能比较 总被引：1，自引：0，他引：1

金国栋刘向明陶智庄绪宗《计算机与数字工程》2010,38(3):25-27

目前快速Fourier变换算法主要有两大类,一类是针对点数为2的整数次幂,一类对应点数为其他长度的情况。在介绍基2和混合基的FFT算法原理的基础上,通过仿真数据对两种FFT算法的性能进行了比较分析。验证结果表明,基2算法在计算速度方面要占有优势,但在整周期截断的情况下,混合基快速算法却在频谱效果方面占有优势。相似文献

3.

分裂基算法在管道腐蚀超声内检测中的应用

戴波张慧平唐建《计算机与应用化学》2007,24(4):515-518

管道腐蚀内检测中超声回波信号具有周期性特点,功率谱估计是重要的数据处理方法之一。基于分裂基的FFT算法具有较小的乘法次数和加法次数,且算法结构较好。采用频率抽取分裂基2／4 FFT算法对管道腐蚀超声内检测回波信号进行了处理．得到管道壁厚数据,经分裂基FFT算法和基2 FFT算法比较,分裂基FFT算法明显减少了数据处理时间,提高了检测速度。理论分析和实验结果表明,该分裂基算法精度高,数据处理速度快,满足管道腐蚀内检测的实时性要求。相似文献

4.

High-performance FFT algorithms for the Convex C4/XA supercomputer

Kevin R. Wadleigh Gary B. Gostin John Liu 《The Journal of supercomputing》1995,9(1-2):163-178

Some implementations of a power-of-two one-dimensional fast Fourier transform (FFT) on vector computers use radix-4 Stockham autosort kernels with a separate transpose step. This paper describes an algorithm that performs well on a Convex C4/XA vector supercomputer on large FFTs by using higher-radix kernels and moving the transpose step into the computational steps. For short transforms a different algorithm is used that calculates the FFT without storing any intermediate results to memory. Performance results using these techniques are given. 相似文献

5.

Algorithm fast fourier transforms with recursively generated trigonometric functions

Dr. K. Dobeš 《Computing》1982,29(3):263-276

New recursive formulae for trigonometric functions generation suitable for FFT algorithms are given. Evaluation of one pair of trigonometric coefficients thus requires 2 multiplications and 2 additions only. Speed comparisons of various radices 2, 4 and 8 FFT FORTRAN realizations were performed. An efficient FORTRAN IV program for one-dimensional complex FFT based on radix 8 algorithm with recursively generated trigonometric coefficients is supplied. 相似文献

6.

频域抽取多维向量基快速傅里叶变换

徐妮妮于海艳肖志涛《计算机应用》2010,30(10):2777-2780

给出了频域抽取(DIF)多维向量基快速傅里叶变换(FFT)算法。对多维频域信号的每一维,采用向量基2频域抽取法,导出了快速算法蝶形运算的一般形式。该FFT算法适合于维数为任意整数的情况,当维数为1时,算法退化为著名的频域抽取向量基2 FFT算法。为了便于编程实现,以频域抽取3维向量基FFT算法为例,给出了快速算法实现流程,该流程易于向任意整数维推广。计算量比较结果显示,频域抽取多维向量基FFT算法比多维分离式FFT算法计算量低。相似文献

7.

基于FPGA的高速数字脉冲压缩

下载免费PDF全文

王超田黎育高梅国《计算机工程》2008,34(4):252-253

研究一种基于现场可编程门阵列实现的高速脉冲压缩处理的硬件结构。设计通用的蝶形处理单元,使其在脉冲压缩处理的3个阶段都能使用,实现了硬件的共享,提高了硬件资源的利用效率。通过可使用原位运算的并行存储器结构,使得每个时钟周期均可完成一次蝶形运算,极大地提高了处理速度。采用块浮点处理单元,兼顾定点的高速率和浮点的高精度。经过实践验证,时钟在100 MHz时完成4 096点的脉冲压缩的时间为140 μs。相似文献

8.

Ultrahigh-performance FFTs for the CRAY-2 and CRAY Y-MP supercomputers

David A. Carlson 《The Journal of supercomputing》1992,6(2):107-116

In this paper a set of techniques for improving the performance of the fast Fourier transform (FFT) algorithm on modern vector-oriented supercomputers is presented. Single-processor FFT implementations based on these techniques are developed for the CRAY-2 and the CRAY Y-MP, and it is shown that they achieve higher performance than previously measured on these machines. The techniques include (1) using gather/scatter operations to maintain optimum length vectors throughout all stages of small-to medium-sized FFTs, (2) using efficient radix-8 and radix-16 inner loops, which allow a large number of vector loads/stores to be overlapped, and (3) prefetching twiddle factors as vectors so that on the CRAY-2 they can later be fetched from local memory in parallel with common memory accesses. Performance results for Fortran implementations using these techniques demonstrate that they are faster than Cray's library FFT routine CFFT2. The actual speedups obtained, which depend on the size of the FFT being computed and the supercomputer being used, range from about 5 to over 300%. 相似文献

9.

一种适用于DVB-T系统的新型FFT处理器设计

周加铳陈咏恩《计算机工程与应用》2006,42(27):16-19

针对地面数字视频广播(DVB-T)系统中高速FFT处理器的设计要求,提出了一种新的基16/8混合基算法及其实现结构。采用单个基16/8复用的蝶形运算单元顺序处理,并通过减少乘法器数目,有效降低了硬件消耗;运算单元内部采用“基4+基4/2”级联流水线方式,大大加快了运算速度;此外,应用对称乒乓RAM结构提高了蝶算单元的连续运算能力;并且使用改进的块浮点防溢出机制,以保证运算精度。仿真和实现结果表明该设计具有良好的性能,完全满足实际应用要求。相似文献

10.

The generalized unified computation of multidimensional discrete orthogonal transforms

成礼智蒋增荣张振慧《中国科学F辑(英文版)》2001,44(6):401-411

By introducing a form of reorder for multidimensional data, we propose a unified fast algo-rithm that jointly employs one-dimensional W transform and multidimensional discrete polynomial trans-form to compute eleven types of multidimensional discrete orthogonal transforms, which contain three types of m-dimensional discrete cosine transforms ( m-D DCTs) ,four types of m-dimensional discrete W transforms ( m-D DWTs) ( m-dimensional Hartley transform as a special case), and four types of generalized discrete Fourier transforms ( m-D GDFTs). For real input, the number of multiplications for all eleven types of the m-D discrete orthogonal transforms needed by the proposed algorithm are only 1/m times that of the commonly used corresponding row-column methods, and for complex input, it is further reduced to 1/(2m) times. The number of additions required is also reduced considerably. Furthermore, the proposed algorithm has a simple computational structure and is also easy to be im-plemented on computer, and th 相似文献

11.

VLSI architecture for low latency radix-4 CORDIC

B. Lakshmi A.S. Dhar 《Computers & Electrical Engineering》2011,37(6):1032-1042

The CORDIC algorithm, originally proposed using nonredundant radix-2 arithmetic, has been refined in terms of throughput and latency with the introduction of redundant arithmetic and higher radix techniques. In this paper, we propose a pipelined architecture using signed digit arithmetic for the VLSI efficient implementation of rotational radix-4 CORDIC algorithm, eliminating z path completely. A detailed comparison of the proposed architecture with the available radix-2 architectures shows the latency and hardware improvement. The proposed architecture achieves latency improvement over the previously proposed radix-4 architecture with a relatively small hardware overhead. The proposed architecture for 16-bit precision was implemented using VHDL and extensive simulations have been performed to validate the results. The functionally simulated net list has been synthesized for 16-bit precision with 90 nm CMOS technology library and the area-time measures are provided. This architecture was also implemented using Xilinx ISE9.1 software and a Virtex device. 相似文献

12.

Performance Analysis of FFT Algorithms on Multiprocessor Systems

《IEEE transactions on pattern analysis and machine intelligence》1983,(4):512-521

A decimation-in-time radix-2 fast Fourier transform (FFT) algorithm is considered here for implementation in multiprocessors with shared bus, multistage interconnection network (MIN), and in mesh connected computers. Results are derived for data allocation, interprocessor communication, approximate computation time, and speedup of an N point FFT on any P available processing elements (PE's). Further generalization is obtained for a radix-r FFT algorithm. An N X N point two-dimensional discrete Fourier transform (DFT) implementation is also considered when one or more rows of the input data matrix are allocated to each PE. 相似文献

13.

应用于OFDM系统中的高速FFT处理器设计

陶金李林森《微机发展》2006,16(6):116-118

针对无线城域网中工作在2GHz~11GHz频带的IEEE802.16a标准,在实现其OFDM系统时提出一种高速而且经济的FFT处理器设计方案。设计中采用了Radix-4的频率抽取算法和并行的蝶型计算单元结构,而且将旋转因子预先存储在ROM中以提高处理器运行的速度。设计方案采用了单个蝶型运算单元以达到控制FFT处理器规模的目的。数据的输入与输出都共用一个存储器,这进一步节约了硬件资源损耗。相似文献

14.

A VLSI constant geometry architecture for the fast Hartley andFourier transforms

Zapata E.L. Arguello F. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(1):58-70

An application-specific architecture for the parallel calculation of the decimation in time and radix 2 fast Hartley (FHT) and Fourier (FFT) transforms is presented. A real sequence with N=2ⁿ data items is considered as input. The system calculates the FHT and the FFT in n and n+1 stages. respectively. The modular and regular parallel architecture is based on a constant geometry algorithm using butterflies of four data items and the perfect unshuffle permutation. With this permutation, the mapping of the algorithm in VLSI technology is simplified and the communications among processors are minimized. Organization of the processor memory based on first-in, first-out (FIFO) queues facilitates a systolic data flow and permits the implementation in a direct way of the complex data movements and address sequences of the transforms. This is accomplished by means of simple multiplexing operations, using hardwired control. The total calculation time is (Nlog₂N)/4Q cycles for the FHT and N(1+log₂N)/4Q cycles for the FFT, where Q is the number of processors ( Q= 2^q, Q⩽N/4) 相似文献

15.

基于基2-FFT的GPS信号捕获算法研究 总被引：1，自引：0，他引：1

李欣周辉刘峰龙腾《微计算机信息》2010,(7)

讨论了各种GPS信号捕获算法,针对采用圆周相关法进行GPS信号捕获中FFT点数为非2的整数次幂的问题,分析了传统补零法和内插法的缺陷,并提出了一种新的基2-FFT捕获算法,该算法通过将采样数据和本地参考码均补零到大于自身长度两倍后做基2-FFT进行圆周相关,理论分析和仿真表明,该算法可以得到与非2的整数次幂点数的圆周相关一样的结果。相似文献

16.

Reconfigurable Superconducting FFT Processor Using Bit-Slice Block Share Processing Unit

《Microprocessors and Microsystems》2020

We have proposed a reconfigurable high speed and very economical Rapid Single Flux Quantum (RSFQ) superconducting logic design based on the Fast Fourier Transform (FFT) Processor. We have designed a 256 – point FFT processor with the help of a bit-slicing block sharing unit. RSFQ is one of the superconducting device logics comprises of Josephson Junction. The computation complexity of this superconducting FFT is less when the number of points increased. We have proposed three different designs depending on the split radix FFT, the bit-serial radix 2 FFT, and the mixed radix FFT algorithms. The proposed design will slice the 256 – point FFT into eight 32 – point FFT each and each 32 – point FFT is divided into eight 4 – point FFT each for the reduction in hardware cost. For complex multiplication, the computation complexity of our design will be less than N/2 Log₂ N for the radix 2 algorithm based on the Block share processing Unit (BSPU) and further, it is reduced for split radix & mixed radix algorithms based on BSPU based RSFQ logic. Due to this, the speed of the processor is improvised compared to general FFT algorithm based semiconductor technology. we have computed and calculated the latency at 10 GHz for our designs. The main aim of this proposed design is to reduce the complex computation time and better performance of the processor with less hardware cost. This proposed design can furthermore continue to several N² – point by using synchronous clock tree. 相似文献

17.

高性能基4快速傅里叶变换处理器的设计 总被引：4，自引：1，他引：3

下载免费PDF全文

段小东顾立志《计算机工程》2008,34(24):238-240

研究并设计高性能基4快速傅里叶变换(FFT)处理器。采用基4算法、流水线结构的蝶形运算单元,提高了处理速度,使芯片能在更高的时钟频率上工作。运用溢出检测状态机对每个蝶形运算单元输出的数据进行块浮点检查,确保对溢出情况进行正确判断。验证与性能评估结果表明,该FFT处理器具有较高性能。相似文献

18.

A simple and efficient parallel FFT algorithm using the BSP model 总被引：1，自引：0，他引：1

Mrcia A. Inda Rob H. Bisseling 《Parallel Computing》2001,27(14):1847-1878

We present a new parallel radix-4 FFT algorithm based on the BSP model. Our parallel algorithm uses the group-cyclic distribution family, which makes it simple to understand and easy to implement. We show how to reduce the communication cost of the algorithm by a factor of 3, in the case that the input/output vector is in the cyclic distribution. We also show how to reduce computation time on computers with a cache-based architecture. We present performance results on a Cray T3E with up to 64 processors, obtaining reasonable efficiency levels for local problem sizes as small as 256 and very good efficiency levels for local sizes larger than 2048. 相似文献

19.

基于FPGA的片内并行机网络模型优化策略分析

李长松《电子技术应用》2008,34(9)

针对基-2 FFT 处理算法,采用分块存储思想,将存储器、处理机数据交换网络模型进行优化。优化后的网络模型数据通路数仅为20,降低为原来的4%以下,且不随 FFT 计算点数增多而增加。整个设计在 Virtex 系统芯片 XCV800上实现。相似文献

20.

改进的多路基-24 FFT处理器设计 总被引：1，自引：1，他引：0

下载免费PDF全文

汪文义王琳凯周金元周晓方《计算机工程》2011,37(7):262-264

给出一种改进的基-24频域抽取FFT算法,基于该算法和SDF结构,提出改进的多路基-24 FFT处理器结构,通过复用常复系数乘法器,减少硬件消耗并维持吞吐率不变。基于改进结构设计2路256点FFT处理器,在SMIC 0.13 μm工艺下综合、布局和布线后的版图核心面积为1.12 mm2,最高工作频率为100 MHz。相似文献