期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

FFT旋转因子生成算法研究

许伟涛王乘《计算机技术与发展》2005,15(6)

FFT是数字信号处理中的一种非常重要的算法.文中提出了一个有效的基2FFT旋转因子生成算法,以减少存储器的存储空间以及读取存储器的次数,达到减少硬件面积和功耗的目的,对于具体应用有一定的实用价值. 相似文献

2.

FFT旋转因子生成算法研究

许伟涛王乘《微机发展》2005,15(6):42-44,47

FFT是数字信号处理中的一种非常重要的算法。文中提出了一个有效的基2FFT旋转因子生成算法，以减少存储器的存储空间以及读取存储器的次数，达到减少硬件面积和功耗的目的，对于具体应用有一定的实用价值。相似文献

3.

数据全并行FFT处理器的设计 总被引：5，自引：0，他引：5

谢应科付博《计算机研究与发展》2004,41(6):1022-1029

讨论了基4和混和基算法的FFT处理器设计问题，提出的操作数地址映射方法充分利用了FFT算法本身的同址性质，能同时提供蝶形运算所需的4个操作数，具有最大的数据并行性，按照旋转因子存放规则，蝶形运算所需的3个旋转因子地址相同，且寻址方式简单，运算部件采用3个乘法的复数运算算法，有效减少了运算部件的大小，它既可以作基4蝶形运算，也可以同时进行2个基2蝶形运算．采用Altera公司的EP200K400E，工作频率达到89MHz，1024点16位复数FFT需要14．1μs，4096点需要67μs。相似文献

4.

一种高速定点FFT处理器的设计与实现 总被引：3，自引：0，他引：3

付博李栋谢应科《计算机工程》2005,31(11):52-55

提出了一种高速定点FFT处理器的设计方法,此方法在CORDIC算法的基础上,通过优化操作数地址映射方法和旋转因子生成方法,每周期完成一个基4蝶形运算,具有最大的并行性。同时按照本文提出的因子生成方法,每个周期可生成3个旋转因子,且硬件实现简单,无须额外的ROM资源。整个系统采用Xilinx公司的XCV2P30仿真,系统频率达到了130MHz,对于1k点16位的复数FFT需要9．8μs,16k点需要221μs,优于目前绝大多数已有的FFT处理器。相似文献

5.

基于新型FPGA的FFT设计与实现

下载免费PDF全文

李岩徐金甫《计算机工程与应用》2007,43(14):102-105

在基于FPGA的FFT设计中,为了提高速度,本文提出了用移位寄存器存储旋转因子的方法,并且在Altera公司的Stratix系列的FPGA上做了验证。实验结果表明,该方法和普遍采用ROM做旋转因子存储器的方法相比,大幅提高了FFF的处理速度,能够更好地满足了FFT实时处理的要求。相似文献

6.

可扩展的旋转因子表及FFT算法 总被引：1，自引：0，他引：1

李青王能超郑楚光《计算机学报》2002,25(4):392-396

该文提出了一个用于快速Fourier变换计算的反写码序的旋转因了表，这种旋转因子表具有可扩展性：本质上，这种旋转因子表的分量与变换的点数无关，当点数改变时，这种旋转因子表无须重新计算或者容易扩展；根据这种旋转因子表，该文设计了一个结构规整的基本基4计算2^n点FFT的算法及软件程序，该程序与FFTW软件包进行了对比实验，文中还以蛋白质序列相似性计算为例，对作者的算法与FFTW软件包中的相庆算法进行了对比实验，结果表明，采用该文的算法可节省计算时间约31．7％。相似文献

7.

改进的高基CORDIC算法及其在FFT中的应用

王冬格周晓方《计算机工程与应用》2014,50(7):41-45

提出了一种改进的高基CORDIC算法,显著减少了传统CORDIC算法的迭代次数,同时保持模校正因子依然是一个常数。该算法可用于旋转角度能事先确定的场合,例如FFT计算中的旋转因子乘法。所设计的复数乘法模块采用SMIC 0.13 μm工艺综合,结果证明,提出的结构相比通用复数乘法器节约了19.2%的硬件面积和29.1%的ROM存储器面积,同时SQNR大于83 dB,满足实际应用的要求。相似文献

8.

混合CORDIC在分裂基FFT中的应用

下载免费PDF全文

万书芹阮园于宗光王国璋李天阳《计算机工程与应用》2010,46(11):73-76

提出了一个基于CORDIC的分裂基FFT/IFFT处理器来计算2048/4096/8192点DFT。蝶形处理器的算术单元和旋转因子产生器采用CORDIC算法实现,所有的控制信号在片内产生。相比于存储旋转因子所需的ROM,CORDIC旋转因子所用ROM尺寸更小。与传统的FFT实现相比功耗减少了25%。相似文献

9.

TMS320C6678的超长点FFT并行计算方法

王春辉程虎朱鸿泰李敏张炯《单片机与嵌入式系统应用》2022,22(1):51-54,59

针对DSP平台算法移植时遇到的超长点FFT实现和运算效率问题,本文结合TI公司的TMS320C6678的DSP,利用FFT的分解算法和L2内存段高效的访存效率,将DSP内存数据EDMA搬移与FFT分解计算相并行,设计出一种超长点FFT计算并行处理方法,通过262144点FFT计算描述了该方法的具体实现过程,将DSP计算... 相似文献

10.

一种高性能1024点fft算法的电路设计

张锦红叶甜春徐建华《微计算机信息》2009,25(8)

本文针对高速大规模FFT处理器的需求提出了一种基-4按时间抽取的双通道FFT算法的硬件结构,采用4块小容量双端口SRAM代替一块大容量SRAM的设计思路以及多级流水结构.此结构能同时从四个存储器中并行存取堞形运算的4个操作数和4个中间结果,极大的提高了处理速度.用CORDIC算法代替传统的乘法器,节省了大量的存放旋转因子的ROM表格和乘法器等硬件资源从而节省了电路面积,并设置了通道关断技术,进一步节省了功耗.经硬件验证,在系统时钟为100MHz时,1024点20位复数FFT计算时间平均为10us左右. 相似文献

11.

An area- and energy-efficient hybrid architecture for floating-point FFT computations

《Microprocessors and Microsystems》2019

Floating-point fast Fourier transform (FFT) has been widely expected in scientific computing and high-resolution imaging applications due to the wide dynamic range and high processing precision. However, it suffers high area and energy overhead problems in comparison to fixed-point implementations. To address these issues, this paper presents an area- and energy-efficient hybrid architecture for floating-point FFT computations. It minimizes the required arithmetic units and reduces the memory usage significantly by combining three different parts. The serial radix-4 butterfly (SR4BF) is used in the single-path delay commutator (SDC) part to minimize the required arithmetic units with 100% adder utilization ratio obtained. A modified single-path delay feedback (MSDF) architecture is proposed to achieve a tradeoff between arithmetic resources and memory usage by using the new half radix-4 butterfly (HR4BF) with 50% adder utilization ratio obtained. The intermediate caching buffer is modified accordingly in the MSDF part. By combining both the advantages on arithmetic units reducing and memory usage optimization in different parts, the optimized area and power are obtained without throughput loss. The logic synthesis results in a 65 nm CMOS technology show that the energy per FFT is about 331.5 nJ for 1024-point FFT computations at 400 MHz. The total hardware overhead is equivalent to 460k NAND2 gates. 相似文献

12.

基于存储技术的高速嵌入式处理器的设计与实现 总被引：1，自引：0，他引：1

张钦韩承德《计算机学报》2007,30(5):831-837

SoPC(片上可编程系统,System on a Programmable Chip)在嵌入式系统中有着广泛的应用,通常用FPGA(现场可编程门阵列,Field Programmable Gate Array)实现.一类嵌入式处理器,例如小波变换处理器、压缩和解压缩处理器、FFT处理器,都可以采用基于存储技术的设计方法.FPGA的片内存储资源相对较少,如何有效地利用FPGA的片内存储资源实现高速的嵌入式处理器成为需要研究的问题.文中以FFT处理器为例说明这种方法的有效性,通过采用一种地址映射调度策略和两种无冲突操作数地址映射方式,减少了所使用的FPGA片内存储资源,提高了处理速度.该FFT处理器在实际系统中起到了关键作用. 相似文献

13.

基于FPGA的片内并行机网络模型优化策略分析

李长松《电子技术应用》2008,34(9)

针对基-2 FFT 处理算法,采用分块存储思想,将存储器、处理机数据交换网络模型进行优化。优化后的网络模型数据通路数仅为20,降低为原来的4%以下,且不随 FFT 计算点数增多而增加。整个设计在 Virtex 系统芯片 XCV800上实现。相似文献

14.

实序列FFT算法的存储单元图解析方法

赵鸿图陈书平吴尧辉《计算机工程与设计》2012,33(8):3083-3088

为了正确有效地开发实序列FFT的汇编语言程序,提出了以存储单元图的方式解析实序列FFT算法的方法。首先推导了由复序列FFT的实虚部计算实序列FFT的实虚部的公式,指出了计算复序列FFT所包括的级别、蝶组、蝶形三层循环,所涉及的正弦量的计算与存储方式,以及复序列FFT转化为实序列FFT的步骤等。在此基础上利用存储单元图在TMS320C54X汇编语言环境下详细解析了实序列FFT的实虚部计算公式。设计了复序列FFT的实虚部计算的第一级、第二级、第三级到最后级的存储单元图,由复序列FFT的实虚部计算其共轭对称与反对称部分的实虚部的存储单元图,以及由此计算实序列FFT的存储单元图。CCS3.3环境下的仿真结果验证了该解析方法的正确性。相似文献

15.

基于FPGA的FFT计算方法研究

李维李颖《工业控制计算机》2011,24(7):50-51

由于目前对快速码捕获速度的要求越来越高,而目前使用比较普遍的码捕获方法是基于FFT的快速码捕获.因此开发出一种快速简单实用的FFT计算方法势在必行.利用FPGA的丰富资源以及灵活的IPCore功能,使设计流程大大简化,为实现FFT算法提供了一种方便快捷的方法.仿真和实验结果证明,该方法准确可靠,计算速度快. 相似文献

16.

基于基2-FFT的GPS信号捕获算法研究 总被引：1，自引：0，他引：1

李欣周辉刘峰龙腾《微计算机信息》2010,(7)

讨论了各种GPS信号捕获算法,针对采用圆周相关法进行GPS信号捕获中FFT点数为非2的整数次幂的问题,分析了传统补零法和内插法的缺陷,并提出了一种新的基2-FFT捕获算法,该算法通过将采样数据和本地参考码均补零到大于自身长度两倍后做基2-FFT进行圆周相关,理论分析和仿真表明,该算法可以得到与非2的整数次幂点数的圆周相关一样的结果。相似文献

17.

Compilation and Communication Strategies for Out-of-Core Programs on Distributed Memory Machines

Rajesh Bordawekar Alok Choudhary J. Ramanujam 《Journal of Parallel and Distributed Computing》1996,38(2):277

It is widely acknowledged that improving parallel I/O performance is critical for widespread adoption of high performance computing. In this paper, we show that communication in out-of-core distributed memory problems may require both interprocessor communication and file I/O. Thus, in order to improve I/O performance, it is necessary to minimize the I/O costs associated with a communication step. We present three methods for performing communication in out-of-core distributed memory problems. The first method, called thegeneralized collective communicationmethod, follows a loosely synchronous model; computation and communication phases are clearly separated, and communication requires permutation of data in files. The second method, called thereceiver-driven in-core communication, communicates only the in-core data. The third method, called theowner-driven in-core communication, goes even one step further and tries to identify the potential future use of data (by the recipients) while it is in the senders memory. We provide performance results for two out-of-core applications: the two-dimensional FFT code, and the two-dimensional elliptic Jacobi solver. 相似文献

18.

An FFT Performance Model for Optimizing General-Purpose Processor Architecture

下载免费PDF全文

李玲陈云霁刘道福钱诚胡伟武《计算机科学技术学报》2011,26(5):875-889

General-purpose processor (GPP) is an important platform for fast Fourier transform (FFT),due to its flexibility,reliability and practicality.FFT is a representative application intensive in both computation and memory access,optimizing the FFT performance of a GPP also benefits the performances of many other applications.To facilitate the analysis of FFT,this paper proposes a theoretical model of the FFT processing.The model gives out a tight lower bound of the runtime of FFT on a GPP,and guides the architecture optimization for GPP as well.Based on the model,two theorems on optimization of architecture parameters are deduced,which refer to the lower bounds of register number and memory bandwidth.Experimental results on different processor architectures (including Intel Core i7 and Godson-3B) validate the performance model.The above investigations were adopted in the development of Godson-3B,which is an industrial GPP.The optimization techniques deduced from our performance model improve the FFT performance by about 40%,while incurring only 0.8% additional area cost.Consequently,Godson-3B solves the 1024-point single-precision complex FFT in 0.368 μs with about 40 Watt power consumption,and has the highest performance-per-watt in complex FFT among processors as far as we know.This work could benefit optimization of other GPPs as well. 相似文献

19.

Portable parallel FFT for MIMD multiprocessors

Amir Averbuch Eran Gabber 《Concurrency and Computation》1998,10(8):583-605

A portable parallelization of the Cooley–Tukey FFT algorithm for MIMD multiprocessors is presented. The implementation uses the virtual machine for multiprocessors (VMMP) and PVM portable software packages. Since VMMP provides the same set of services on all target machines, a single version of the parallel FFT code was used for shared memory (25-processor Sequent Symmetry), shared bus (MOS-running distributed UNIX) and distributed memory multiprocessor (transputer network and 64-processor IBM SP2). It is accompanied with detailed performance analysis of the implementations. The algorithm achieved high efficiencies on all target machines. The analysis indicates that most overheads are caused by the target architecture and not by VMMP or PVM inefficiencies. The portability analysis of the FFT provides several important insights. On the message passing architecture, the parallel FFT algorithm can obtain linearly increasing speedup with respect to the number of processors with only a moderate increase in the problem size. The parallel FFT can be executed by any number of processors, but generally the number of processors is much less than the length of the input data. The results indicate that the parallel FFT is portable: it achieves very good speedups on either a shared memory multiprocessor with high memory bandwidth or on a message passing multiprocessor without any change in the programs. © 1998 John Wiley & Sons, Ltd. 相似文献

20.

FFT处理器无冲突地址生成方法 总被引：8，自引：2，他引：6

马余泰《计算机学报》1995,18(11):875-880

本文提出了一种新的无冲突地址生成方法，使蝶式运算单元在一个周期内能够同时读取两个操作数。由于取消了地址奇偶判别电路，简化了存储体控制逻辑，同时也加快了输入／输出地址生成，该方法还同样适用于基－４ＦＦＴ处理器。相似文献