期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

戴亦奇《电子与封装》2011,11(12):8-11

文章提出了一种以基-22／23为基础的流水线结构,用以实现低成本、超大规模集成电路（VLSI）的快速傅里叶变换（FFT）处理器设计。该处理器在减少普通复数乘法器级数的同时,通过单路延时反馈（SDF）存取方式,以最少的存储字来获得FFT结果。对于数据通路,我们采用了混合浮点的数据缩放方式,在保证信噪比的同时,降低了数据长... 相似文献

2.

A radix-8 wafer scale FFT processor 总被引：2，自引：0，他引：2

Earl E. Swartzlander Jr. Vijay K. Jain Hiroomi Hikawa 《The Journal of VLSI Signal Processing》1992,4(2-3):165-176

Wafer Scale Integration promises radical improvements in the performance of digital signal processing systems. This paper describes the design of a radix-8 systolic (pipeline) fast Fourier transform processor for implementation with wafer scale integration. By the use of the radix-8 FFT butterfly wafer that is currently under development, continuous data rates of 160 MSPS are anticipated for FFTs of up to 4096 points with 16-bit fixed point data. 相似文献

3.

基于无冲突地址生成的高性能FFT处理器设计

王江黑勇郑晓燕仇玉林《微电子学与计算机》2007,24(3):15-19

提出一种基于存储器交织架构的FFT处理器设计方法,并且针对基-8FFT提出一种无冲突地址生成算法,数据按帧进行操作。每个存储器均划分为8个独立的存储体,通过对循环移位寄存器译码,蝶式运算单元并行无冲突读写操作数,8通道输入数据进行并行的复数乘法运算。每级运算引入完全流水,减少了运算的时钟周期开销,同时推导出局部流水线设计必须满足的不等式条件。输入、输出存储器采用乒乓操作,按帧轮换,FFT运算连续输入、输出,采样频率与系统工作频率一致,具有很好的实时性,运算精度通过块浮点得到保证。该设计方法可以扩展至基-16FFT处理器设计。相似文献

4.

Design Space Exploration of 1-D FFT Processor

Shaohan Liu Dake Liu 《Journal of Signal Processing Systems》2018,90(11):1609-1621

A design space exploration methodology of 1-D FFT processor is proposed to find the best hardware architecture in a quantitative way during early design. The methodology includes architecture candidate collection, coarse-grained architecture selection, and circuit level design optimizations. We show how to select a better architecture from candidates including different architectures (SDF, SDC, MDF, MDC and memory-based) with different degree of parallelism at different radices. The sub-level designs, including designs of rotator and data scaling module, are introduced for further optimizations. As a proof of concept, an FFT processor for 4G, WLAN and future 5G is designed supporting 16-4096 and 12-2400 point FFTs. Memory-based architecture with 16-datapath mixed-radix butterfly unit is selected to satisfy the demands for 1GS/s (4096) throughput. The synthesis result based on 65nm technology shows that the silicon cost and power consumption are 1.46mm2 and 68.64mW respectively. The proposed processor has better normalized throughput per area unit and normalized FFTs per energy unit than the state of the art available designs. 相似文献

5.

A chip set for pipeline and parallel pipeline FFT architectures

V. Szwarc L. Desormeaux W. Wong C. P. S. Yeung C. H. Chan T. A. Kwasniewski 《The Journal of VLSI Signal Processing》1994,8(3):253-265

A chip set for pipelined and parallel pipelined FFT applications is presented. The set consists of two cascadeable chips with built-in self-test and a chip-interconnectivity test feature. The two ASICs are a 15k gate Complex-Butterfly and a 9k gate FFT Switch. The Complex-Butterfly uses redundant binary arithmetic (RBA), a modified Booth algorithm and a Wallace tree architecture to achieve a throughput of better than 25 Msamples/sec. The cascadeable FFT Switch is designed to support the implementation of radix-2, 2^N point, pipeline FFTs. Both devices have been fabricated in 1.5m CMOS gate array technology. 相似文献

6.

基于FPGA的高速实时FFT处理器设计 总被引：5，自引：0，他引：5

周海斌刘刚《电子工程师》2005,31(1):54-56

结合高速、实时快速傅里叶变换(FFT)的实际需求,在分析了基4、按频率抽取(DIF)FFT算法的基础上,采用多级串行的同步流水线结构,利用现场可编程门阵列(FPGA)完成1 024点、16位复数点、块浮点FFT.整个设计划分成多个功能模块,全部采用Verilog HDL描述,并在Virtex-Ⅱ器件上实现.结果表明,利用FPGA实现复杂的数字信号处理(DSP)算法是完全可行的. 相似文献

7.

Pseudo-Parallel Datapath Structure for Power Optimal Implementation of 128-pt FFT/IFFT for WPAN

J. Mathew K. Maharatna B. R. Jose H. Rahaman D. K. Pradhan 《Circuits, Systems, and Signal Processing》2011,30(4):871-882

An optimal implementation of 128-Pt FFT/IFFT for low power IEEE 802.15.3a WPAN using pseudo-parallel datapath structure is presented, where the 128-Pt FFT is devolved into 8-Pt and 16-Pt FFTs and then once again by devolving the 16-Pt FFT into 4×4 and 2×8. We analyze 128-Pt FFT/IFFT architecture for various pseudo-parallel 8-Pt and 16-Pt FFTs and an optimum datapath architecture is explored. It is suggested that there exists an optimum degree of parallelism for the given algorithm. The analysis demonstrated that with a modest increase in area one can achieve significant reduction in power. The proposed architectures complete one parallel-to-parallel (i.e., when all input data are available in parallel and all output data are generated in parallel) 128-point FFT computation in less than 312.5 ns and thereby meet the standard specification. The relative merits and demerits of these architectures have been analyzed from the algorithm as well as implementation point of view. Detailed power analysis of each of the architectures with a different number of data paths at block level is described. We found that from power perspective the architecture with eight datapaths is optimum. The core power consumption with optimum case is 60.6 MW which is only less than half of the latest reported 128-point FFT design in 0.18u technology. Furthermore, a Single Event Upset (SEU) tolerant scheme for registers is also explored. The SEU tolerant scheme will not affect the performance, however, there is an increase power consumption of about 42 percent. Apart from the low power consumption, the advantages of the proposed architectures include reduced hardware complexity, regular data flow and simple counter based control. 相似文献

8.

A Single Chip Digital Signal Processor and Its Application to Real-Time Speech Analysis

《Solid-State Circuits, IEEE Journal of》1983,18(1):91-99

A single chip high-performance digital signal processor (HSP) has been developed for speech, telecommunication, and other applications. The HSP uses 3 µm CMOS technology and its architecture features floating point arithmetic and pipeline structure. By adoption of floating point arithmetic, data covering a wide dynamic range (up to 32 bits) can be manipulated. The input clock frequency is 16 MHz, and the instruction cycle time is 250 ns. Efficient signal processing instructions and a large internal memory (program ROM: 512 words; data RAM: 200 words; data ROM: 128 words) make it possible to construct a compact speech analysis circuit by the LPC (PARCOR) method with two HSP's. This paper describes HSP architecture, LSI design, and a speech analysis application. 相似文献

9.

基于FPGA的大时宽带宽积频域脉压设计

汪灏洪一《现代电子技术》2007,30(18):73-75

主要介绍基于Altera公司FPGA器件的高速实时FFT运算单元实现及频率域脉冲压缩处理的设计方法。在分析基8、按频率抽取FFT算法的基础上,采用多级同步流水线结构,利用现场可编程门阵列(FPGA)完成最大4 096点块浮点FFT。整个设计划分成多个功能模块,采用VHDL描述语言,并在Stratix器件上实现。结果表明,利用FPGA实现复杂的数字信号处理(DSP)算法是完全可行的。相似文献

10.

多点数高精度低存储的FFT处理器设计

王祯韩泽耀《信息技术》2008,32(3):34-37

提出了一种新的基于基23算法单路径反馈流水线结构的FFT处理器.通过对数据通路的动态调整,解除了变换点数必须是8的幂次的限制,可高效实现任意2n点的FFT/IFFT变换.并将自定义浮点格式引入流水线,同时在流水线输入端添加预处理单元,在不引入过多逻辑的情况下,有效的提高了FFT的变换精度,同时存储器的使用量降低10%. 相似文献

11.

FFTs with Near-Optimal Memory Access Through Block Data Layouts: Algorithm,Architecture and Design Automation

Berkin Akin Franz Franchetti James C. Hoe 《Journal of Signal Processing Systems》2016,85(1):67-82

Fast Fourier transform algorithms on large data sets achieve poor performance on various platforms because of the inefficient strided memory access patterns. These inefficient access patterns need to be reshaped to achieve high performance implementations. In this paper we formally restructure 1D, 2D and 3D FFTs targeting a generic machine model with a two-level memory hierarchy requiring block data transfers, and derive memory access pattern efficient algorithms using custom block data layouts. These algorithms need to be carefully mapped to the targeted platform’s architecture, particularly the memory subsystem, to fully utilize performance and energy efficiency potentials. Using the Kronecker product formalism, we integrate our optimizations into Spiral framework and evaluate a family of DRAM-optimized FFT algorithms and their hardware implementation design space via automated techniques. In our evaluations, we demonstrate DRAM-optimized accelerator designs over a large tradeoff space given various problem (single/double precision 1D, 2D and 3D FFTs) and hardware platform (off-chip DRAM, 3D-stacked DRAM, ASIC, FPGA, etc.) parameters. We show that Spiral generated pareto optimal designs can achieve close to theoretical peak performance of the targeted platform offering 6x and 6.5x system performance and power efficiency improvements respectively over conventional row-column FFT algorithms. 相似文献

12.

An expandable column fft architecture using circuit switching networks

Tom Chen Li Zhu 《The Journal of VLSI Signal Processing》1993,6(3):243-257

The Fast Fourier Transform (FFT) is widely used in various digital signal processing applications. The performance requirements for FFT in modern real-time applications has increased dramatically due to the high demand on capacity and performance of modern telecommunication systems, where FFT plays a major role. Software implementations of FFT running on a general purpose computer can no longer meet current speed requirements. However, recent advances in VLSI technology have made it possible to implement the entire FFT system on a single silicon substrate. This article presents a column FFT design suitable for ULSI (Ultra Large Scale Integration) implementations. The basic building block is a 64-point column FFT. FFTs with longer transform lengths can be easily realized using the 64-point column FFT building block. The butterfly processors in the column FFT are connected using circuit switching networks. The circuit switching networks not only provide dynamically recon-figurable interconnections among the butterfly processors, but also provide a fault-tolerant capability. Bit-serial arithmetic is used in the architecture. Assuming the data word length is 16 bits, the 1024-point column FFT engine proposed in this article is capable of processing 1024 complex data samples in 533 clock cycles. If the clock frequency is 40 MHz, it will take 13.3 µs to complete a 1024-point FFT. 相似文献

13.

A Hardware Acceleration Platform for Digital Holographic Imaging

Thomas Lenart Mats Gustafsson Viktor Öwall 《Journal of Signal Processing Systems》2008,52(3):297-311

This paper presents a hardware acceleration platform for image reconstruction in digital holographic imaging. The hardware accelerator executes a computationally demanding reconstruction algorithm which transforms an interference pattern captured on a digital image sensor into visible images. Focus in this work is to maximize computational efficiency, and to minimize the external memory transfer overhead, as well as required internal buffering. The paper presents an efficient processing datapath with a fast transpose unit and an interleaved memory storage scheme. The proposed architecture results in a speedup with a factor 3 compared with the traditional column/row approach for calculating the two-dimensional FFT. Memory sharing between the computational units reduces the on-chip memory requirements with over 50%. The custom hardware accelerator, extended with a microprocessor and a memory controller, has been implemented on a custom designed FPGA platform and integrated in a holographic microscope to reconstruct images. The proposed architecture targeting a 0.13 µm CMOS standard cell library achieves real-time image reconstruction with 20 frames per second. 相似文献

14.

High‐Performance Low‐Power FFT Cores

Wei Han Ahmet T. Erdogan Tughrul Arslan Mohd. Hasan 《ETRI Journal》2008,30(3):451-460

Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low‐power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low‐power commutators based on an advanced interconnection, and parallel‐pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel‐pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures. 相似文献

15.

An Efficient Pipeline Architecture and Memory Bit-Width Analysis for Discrete Wavelet Transform of the 9/7 Filter for JPEG 2000

Chung-Fu Lin Pei-Kung Huang Bing-Fei Wu 《Journal of Signal Processing Systems》2010,59(3):245-253

In this paper, we propose an efficient pipeline architecture for the DWT 9/7 filter defined in JPEG 2000. The proposed architecture is composed of column and row processors to perform the separable 2-D DWT. Based on the rescheduling DWT algorithm, we derive a new data flow graph to shorten the critical path. The proposed 1-D column processor requires less pipeline registers to achieve about the same critical path compared with other lifting-based architectures. For the row processor, the data dependency of each lifting step is reduced to only two computation nodes and therefore more pipeline registers can be applied to achieve higher processing speed without increasing the internal memory size in the 2-D case. That is, for an N × N image, it only requires 4N internal memory to perform the row-wise transform. For the memory bit-width analysis, we use software simulation to reduce the memory bit-width for various compression ratios. Since a portion of information from least significant bits of DWT coefficients would be discarded after EBCOT-tier2 processing, one can decrease the data width of internal memory to perform various compression ratios of JPEG 2000 coding, especially at the low-bit rates. Our simulation results suggest that it is practically possible to design the energy-aware memory architecture to further reduce the power consumption in the future work. 相似文献

16.

一种无线局域网专用高性能FFT处理器设计

下载免费PDF全文

王江黑勇郑晓燕仇玉林《电子器件》2007,30(2):475-480

针对基8算法提出一种无冲突地址生成方法,设计了802.11a专用FFT处理器,整体采用流水处理,实现了一种高性能FFT硬件架构,各级RAM采用乒乓操作,每个RAM均由8个独立的SRAM存储体组成,通过对循环移位寄存器译码,蝶算单元并行无冲突读写RAM操作数,8通道输入数据并行处理,每级运算所需的时钟周期大幅度降低.FFT运算连续输入、输出,数据运算精度通过块浮点得到保证.整体具有高速、高精度的特征.本文提出的无冲突地址生成方法也可以扩展至高点数FFT的应用. 相似文献

17.

基于CMMB标准的新型FFT处理器设计

廖敏超杨波陈昕《电视技术》2009,33(Z1)

针对中国移动多媒体广播(CMMB)系统中高速FFT处理器的设计要求,提出了一种新的适用大点数FFT算法的流水线实现结构.采用了混合基4/2、按频率抽取FFT算法,完成了4 096/2 048点,13 bit位宽,定点复数FFr的设计,两个点数的FFT变换能够采用同一套结构实现,节约了资源.设计全部采用VerilogHDL语言描述并通过FPGA仿真验证. 相似文献

18.

Multi-mode parallel and folded VLSI architectures for 1D-fast Fourier transform

《Integration, the VLSI Journal》2016

The modern real time applications like orthogonal frequency division multiplexing and etc., demand high performance fast Fourier transform (FFT) design with less area and clock cycles. This paper proposes efficient FFT VLSI architectures using folded/parallel implementation. In the proposed folded FFT architecture, the number of cycles required to complete the operation is less than single path delay feedback (SDF)/multi-path delay commutator (MDC) architectures. In the proposed parallel FFT architecture, N-point FFT is implemented by using one N/2-point FFT without much extra hardware. Both the proposed architectures are implemented for radix-2, 2², and 4 using 45 nm technology library. The proposed parallel architecture achieves 56.7% and 40.6% of area reduction as compared with the existing parallel architecture based 16-point radix-2 and radix-2² DIF FFTs respectively. The proposed folded architecture achieves 65.5%, 51.1%, and 35.8% of worst path delay reduction as compared with the existing SDF based 16-point radix-2, radix-2², and radix-4 DIF FFTs respectively. 相似文献

19.

高效可配置浮点FFT处理器设计

桑红石高伟《微电子学与计算机》2012,29(4):36-40

为了克服高精度浮点FFT处理器具有较大资源开销的设计瓶颈,采用基于单口存储器的FIFO构建共享蝶形结构的R2/22SDF流水可配置结构.采用适合浮点设计的基2/22算法实现流水结构,不仅有利于可配置电路的实现,还能够有效减少复数乘法次数,提高复数乘法器的计算效率.采用双倍数据位宽的单口存储器实现FIFO存储器,有效避免了双口存储器面积和功耗较大的问题.改进的蝶形共享结构实现两级蝶形的合并,解决了单路径延迟反馈流水线结构蝶形单元利用率低的问题.与传统流水线结构FFT处理器设计相比,有效降低了浮点设计中的资源开销,提高了计算单元的利用效率. 相似文献

20.

一种低复杂度的通用FFT处理器

周敏余松煜归琳《电视技术》2005,(8):35-37

利用R22SDF算法的低复杂度的特点,在其基础上演变出一种通用的FFT算法.该方法可适用于所有的2n点FFT运算.该算法采用流水线结构,以满足数据实时性处理的要求. 相似文献