首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A radix-8 wafer scale FFT processor   总被引:2,自引:0,他引:2  
Wafer Scale Integration promises radical improvements in the performance of digital signal processing systems. This paper describes the design of a radix-8 systolic (pipeline) fast Fourier transform processor for implementation with wafer scale integration. By the use of the radix-8 FFT butterfly wafer that is currently under development, continuous data rates of 160 MSPS are anticipated for FFTs of up to 4096 points with 16-bit fixed point data.  相似文献   

2.
A 2.4-Gsample/s DVFS FFT Processor for MIMO OFDM Communication Systems   总被引:1,自引:0,他引:1  
This paper presents a new dynamic voltage and frequency scaling (DVFS) FFT processor for MIMO OFDM applications. By the proposed multimode multipath-delay-feedback (MMDF) architecture, our FFT processor can process 1-8-stream 256-point FFTs or a high-speed 256-point FFT in two processing domains at minimum clock frequency for DVFS operations. A parallelized radix-24 FFT algorithm is also employed to save the power consumption and hardware cost of complex multipliers. Furthermore, a novel open-loop voltage detection and scaling (OLVDS) mechanism is proposed for fast and robust voltage management. With these schemes, the proposed FFT processor can operate at adequate voltage/frequency under different configurations to support the power-aware feature. A test chip of the proposed FFT processor has been fabricated using UMC 90 nm single-poly nine-metal CMOS process with a core area of 1.88 times1.88 mm2 . The SQNR performance of this FFT chip is over 35.8 dB for QPSK/16-QAM modulation. Power dissipation of 2.4 Gsample/s 256-point FFT computations is about 119.7 mW at 0.85 V. Depending on the operation mode, power can be saved by 18%-43% with voltage scaling in TT corner.  相似文献   

3.
This paper proposes a novel concept of adjusting the hardware size in a multi‐carrier code division multiple access (MC‐CDMA) receiver in real time as per the channel parameters such as delay spread, signal‐to‐noise ratio, transmission rate, and Doppler frequency. The fast Fourier transform (FFT) or inverse FFT (IFFT) size in orthogonal frequency division multiplexing (OFDM)/MC‐CDMA transceivers varies from 1024 points to 16 points. Two low‐power reconfigurable radix‐4 256‐point FFT processor architectures are proposed that can also be dynamically configured as 64‐point and 16‐point as per the channel parameters to prove the concept. By tailoring the clock of the higher FFT stages for longer FFTs and switching to shorter FFTs from longer FFTs, significant power saving is achieved. In addition, two 256 sub‐carrier MC‐CDMA receiver architectures are proposed which can also be configured for 64 sub‐carriers in real time to prove the feasibility of the concept over the whole receiver.  相似文献   

4.
We present a novel 4096 complex-point, fully systolic VLSI FFT architecture based on the combination of three consecutive radix-4 stages resulting in a 64-point FFT engine. The outcome of cascading these 64-point FFT engines is an improved architecture that efficiently processes large input data sets in real time. Using 64-point FFT engines reduces the buffering and the latency to one third of a fully unfolded radix-4 architecture, while the radix-4 schema simplifies the calculations within each engine. The proposed 4096 complex point architecture has been implemented on a FPGA achieving a post-route clock frequency of 200 MHz resulting in a sustained throughput of 4096 point/20.48 μs. It has also been implemented on a high performance 0.13 μm, 1P8M CMOS process achieving a worst-case (0.9 V, 125 C) post-route clock frequency of 604.5 MHz and a sustained throughput of 4096 point/3.89 μs while consuming 4.4 W. The architecture is extended to accomplish FFT computations of 16K, 64K and 256K complex points with 352, 256 and 188 MHz operating frequencies respectively.  相似文献   

5.
In this paper, we present a novel 128/64 point fast Fourier transform (FFT)/ inverse FFT (IFFT) processor for the applications in a multiple-input multiple-output orthogonal frequency-division multiplexing based IEEE 802.11n wireless local area network baseband processor. The unfolding mixed-radix multipath delay feedback FFT architecture is proposed to efficiently deal with multiple data sequences. The proposed processor not only supports the operation of FFT/IFFT in 128 points and 64 points but can also provide different throughput rates for 1-4 simultaneous data sequences to meet IEEE 802.11n requirements. Furthermore, less hardware complexity is needed in our design compared with traditional four-parallel approach. The proposed FFT/IFFT processor is designed in a 0.13-mum single-poly and eight-metal CMOS process. The core area is 660times2142 mum2 , including an FFT/IFFT processor and a test module. At the operation clock rate of 40 MHz, our proposed processor can calculate 128-point FFT with four independent data sequences within 3.2 mus meeting IEEE 802.11n standard requirements  相似文献   

6.
The modern real time applications like orthogonal frequency division multiplexing and etc., demand high performance fast Fourier transform (FFT) design with less area and clock cycles. This paper proposes efficient FFT VLSI architectures using folded/parallel implementation. In the proposed folded FFT architecture, the number of cycles required to complete the operation is less than single path delay feedback (SDF)/multi-path delay commutator (MDC) architectures. In the proposed parallel FFT architecture, N-point FFT is implemented by using one N/2-point FFT without much extra hardware. Both the proposed architectures are implemented for radix-2, 22, and 4 using 45 nm technology library. The proposed parallel architecture achieves 56.7% and 40.6% of area reduction as compared with the existing parallel architecture based 16-point radix-2 and radix-22 DIF FFTs respectively. The proposed folded architecture achieves 65.5%, 51.1%, and 35.8% of worst path delay reduction as compared with the existing SDF based 16-point radix-2, radix-22, and radix-4 DIF FFTs respectively.  相似文献   

7.
Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low‐power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low‐power commutators based on an advanced interconnection, and parallel‐pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel‐pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures.  相似文献   

8.
Fast Fourier transform algorithms on large data sets achieve poor performance on various platforms because of the inefficient strided memory access patterns. These inefficient access patterns need to be reshaped to achieve high performance implementations. In this paper we formally restructure 1D, 2D and 3D FFTs targeting a generic machine model with a two-level memory hierarchy requiring block data transfers, and derive memory access pattern efficient algorithms using custom block data layouts. These algorithms need to be carefully mapped to the targeted platform’s architecture, particularly the memory subsystem, to fully utilize performance and energy efficiency potentials. Using the Kronecker product formalism, we integrate our optimizations into Spiral framework and evaluate a family of DRAM-optimized FFT algorithms and their hardware implementation design space via automated techniques. In our evaluations, we demonstrate DRAM-optimized accelerator designs over a large tradeoff space given various problem (single/double precision 1D, 2D and 3D FFTs) and hardware platform (off-chip DRAM, 3D-stacked DRAM, ASIC, FPGA, etc.) parameters. We show that Spiral generated pareto optimal designs can achieve close to theoretical peak performance of the targeted platform offering 6x and 6.5x system performance and power efficiency improvements respectively over conventional row-column FFT algorithms.  相似文献   

9.
The design of Fast Fourier Transform (FFT) integrated architectures for System-on-Chip (SoC) telecom applications is addressed in this paper. After reviewing the FFT processing requirements of wireless and wired Orthogonal Frequency Division Multiplexing (OFDM) standards, including the emerging Multiple Input Multiple Output (MIMO) and OFDM Access (OFDMA) schemes, three FFT architectures are proposed: a fully parallel, a pipelined cascade and an in-place variable-size architecture, which offer different trade-offs among flexibility, processing speed and complexity. Silicon implementation results and comparisons with the state-of-the-art prove that each macrocell outperforms the known works for a target application. The fully parallel is optimized for throughput requirements up to several GSamples/s enabling Ultra-wideband (UWB) communications by using all channels foreseen in the standard. The pipelined cascade macrocell minimizes complexity for large size FFTs sustaining throughput up to 100 MSamples/s. The in-place variable-size FFT macrocell stands for its flexibility by allowing run-time reconfigurability required in OFDMA schemes while attaining the required throughput to support MIMO communications. The three architectures are also compared with common case-studies and target technology.  相似文献   

10.
设计了一种应用于802.11a的64点FFT/IFFT处理器.采用单蝶形4路并行结构,提出了4路并行无冲突地址产生方法,有效地提高了吞吐率,完成64点FFT/IFFT运算只需63个时钟周期.提出的RAM双乒乓结构实现了对输入和输出均为连续数据流的缓存处理.不仅能实现64点FFT和IFFT,而且位宽可以根据系统任意配置.为了提高数据运算的精度,设计采用了块浮点算法,实现了精度与资源的折中.16位位宽时,在HJTC 0.18μmCMOS工艺下综合,内核面积为:0.626 7 mm2,芯片面积为:1.35 mm×1.27 mm,最高工作频率可达300 MHz,功耗为126.17 mW.  相似文献   

11.
A 1-GS/s FFT/IFFT processor for UWB applications   总被引:1,自引:0,他引:1  
In this paper, we present a novel 128-point FFT/IFFT processor for ultrawideband (UWB) systems. The proposed pipelined FFT architecture, called mixed-radix multipath delay feedback (MRMDF), can provide a higher throughput rate by using the multidata-path scheme. Furthermore, the hardware costs of memory and complex multipliers in MRMDF are only 38.9% and 44.8% of those in the known FFT processor by means of the delay feedback and the data scheduling approaches. The high-radix FFT algorithm is also realized in our processor to reduce the number of complex multiplications. A test chip for the UWB system has been designed and fabricated using 0.18-/spl mu/m single-poly and six-metal CMOS process with a core area of 1.76/spl times/1.76 mm/sup 2/, including an FFT/IFFT processor and a test module. The throughput rate of this fabricated FFT processor is up to 1 Gsample/s while it consumes 175 mW. Power dissipation is 77.6 mW when its throughput rate meets UWB standard in which the FFT throughput rate is 409.6 Msample/s.  相似文献   

12.
高吞吐浮点可灵活重构的快速傅里叶变换(FFT)处理器可满足尖端雷达实时成像和高精度科学计算等多种应用需求。与定点FFT相比,浮点运算复杂度更高,使得浮点型FFT的运算吞吐率与其实现面积、功耗之间的矛盾问题尤为突出。鉴于此,为降低运算复杂度,首先将大点数FFT分解成若干个小点数基2k 级联子级实现,提出分别针对128/256/512/1024/2048点FFT的优化混合基算法。同时,结合所提出同时支持单通道单精度和双通道半精度两种浮点模式的新型融合加减与点乘运算单元,首次提出一款高吞吐率双模浮点可变点FFT处理器结构,并在28 nm标准CMOS工艺下进行设计并实现。实验结果表明,单通道单精度和双通道半精度浮点两种模式下的运算吞吐率和输出平均信号量化噪声比分别为3.478 GSample/s, 135 dB和6.957 GSample/s, 60 dB。归一化吞吐率面积比相比于现有其他浮点FFT实现可提高约12倍。  相似文献   

13.
A single-chip reconfigurable FFT/IFFT processor that employs a ring-structured multiprocessor architecture is presented. Multi-level reconfigurability is realized by dynamically allocating computation resources needed by specific applications. The processor IC was fabricated in 0.25-/spl mu/m CMOS. It performs 8-point to 4096-point complex FFT/IFFT with power-consumption scalability and provides useful trade-offs between algorithm flexibility, implementation complexity and energy efficiency.  相似文献   

14.
陆晓凤  刘锋  佟冬  王克义 《电子学报》2011,39(5):1072-1076
本文针对H.264 Fidelity Range Extensions(FRExt,High Profile)解码过程中扩展的所有变换,采用二维矩阵分解和基于矩阵运算提取公共因子的操作,利用通用运算单元来设计高效的可重构VLSI结构.该结构不但节省面积(可重构变换结构只消耗了4807门电路),并且具有高性能(采用TSM...  相似文献   

15.
《Microelectronics Journal》2015,46(5):370-376
This work presents an energy efficient architecture for an anti-traffic noise system. The hardware is designed for a road side unit (RSU) in intelligent transportation systems. Fast Fourier Transform is the cornerstone for the suggested system. An ultra low power architecture for the FFT suitable for FPGA implementation is derived. Bit-widths for both data and twiddle factors are optimized for low-power. The architecture uses an efficient complex multiplier that has 25% less multiplications. An algorithm to compute the number of time-shared butterflies for a given FFT block size and a target throughput is elaborated. Finally synthesis results using fixed-point VHDL library and commercial IP are presented and compared with the proposed FFT processor.  相似文献   

16.
An optimal implementation of 128-Pt FFT/IFFT for low power IEEE 802.15.3a WPAN using pseudo-parallel datapath structure is presented, where the 128-Pt FFT is devolved into 8-Pt and 16-Pt FFTs and then once again by devolving the 16-Pt FFT into 4×4 and 2×8. We analyze 128-Pt FFT/IFFT architecture for various pseudo-parallel 8-Pt and 16-Pt FFTs and an optimum datapath architecture is explored. It is suggested that there exists an optimum degree of parallelism for the given algorithm. The analysis demonstrated that with a modest increase in area one can achieve significant reduction in power. The proposed architectures complete one parallel-to-parallel (i.e., when all input data are available in parallel and all output data are generated in parallel) 128-point FFT computation in less than 312.5 ns and thereby meet the standard specification. The relative merits and demerits of these architectures have been analyzed from the algorithm as well as implementation point of view. Detailed power analysis of each of the architectures with a different number of data paths at block level is described. We found that from power perspective the architecture with eight datapaths is optimum. The core power consumption with optimum case is 60.6 MW which is only less than half of the latest reported 128-point FFT design in 0.18u technology. Furthermore, a Single Event Upset (SEU) tolerant scheme for registers is also explored. The SEU tolerant scheme will not affect the performance, however, there is an increase power consumption of about 42 percent. Apart from the low power consumption, the advantages of the proposed architectures include reduced hardware complexity, regular data flow and simple counter based control.  相似文献   

17.
Two 128-point 16-bit radix-2 FFT/IFFT processors based on synchronous-logic (sync) and asynchronous-logic (async) for low voltage (1.1-1.4 V) energy-critical low-speed hearing aids are described. The two processors herein are designed with the same function and similar architecture, and the emphasis is energy efficacy. The async approach, on average, features ~37% lower energy per FFT/IFFT computation than the sync approach but with ~10% larger IC area penalty and an inconsequential 1.4 times worse delay; the async design can be designed to be 0.24 times faster and with largely the same energy dissipation if the matched delay elements and the latch controllers therein are better optimized. In this low-speed application, the lower energy feature of the async design is not attributed to the absence of the clock infrastructure but instead due to the adoption of established and proposed async circuit designs, resulting in reduced redundant operations and reduced spurious/glitch switching, and to the use of latches. The prototype async FFT/IFFT processor (in a 0.35-mum CMOS process) can be operated at 1.0 V and dissipates 93 nJ.  相似文献   

18.
设计实现了基于FPGA的256点定点FFT处理器。处理器以基-2算法为基础,通过采用高效的两路输入移位寄存器流水线结构,有效提高了碟形运算单元的运算效率,减少了寄存器资源的使用,提高了最大工作频率,增大了数据吞吐量,并且使得处理器具有良好的可扩展性。详细描述了具体设计的算法结构和各个模块的实现。设计采用Verilog HDL作为硬件描述语言,采用QuartusⅡ设计仿真工具进行设计、综合和仿真,仿真结果表明,处理器工作频率为72 MHz,是一种高效的FFT处理器IP核。  相似文献   

19.
描述了一种高效的FFT(fast Fourier transform)流水线结构,采用这种流水线结构不仅能提高数据速率,而且能有效减小设计的规模.作为OFDM(orthogonal frequency division multiplexing)系统实现的关键部分,FFT的设计关系到整个系统的实现规模.作为应用之一,笔者在DVB-T接收机中采用了这种FFT结构,实现了对2K/8K双模式的解调.该结构还可方便地应用到其他应用FFT的场合,且易于实现多种模式的并存.  相似文献   

20.
The problem of an efficient very large scale integration (VLSI) realization of the direct/inverse fast Fourier transform (FFT/IFFT) for digital subscriber line (DSL) applications is addressed in this paper. The design of scalable and very high-rate (VDSL) modem claims for large and high-throughput complex FFT computations while for massive and fast deployment of the xDSL family low-cost and low-power constraints are key issues. Throughout the paper we explore the design space at different levels (algorithm, arithmetic accuracy, architecture, technology) to achieve the best trade-off between processing performance, hardware complexity and power consumption. A programmable VLSI processor based on a FFT/IFFT cascade architecture plus pre/post-processing stages is discussed and characterized from the high-level choices down to the gate-level synthesis. Furthermore low-power design techniques, based on clock gating and data driven switching activity reduction, are used to decrease the power consumption exploiting the correlation of the FFT/IFFT coefficients and the statistics of the input signals. To this aim both frequency-division and time-division duplex schemes have been considered. The effects of supply voltage scaling and its consequence on circuit performance are examined in detail, as well as the use of different target technologies. Synthesis results for a 0.18 μm CMOS standard-cells technology show that the processor is suitable for real-time modulation and demodulation in scalable full-rate VDSL modem (64-4096 complex FFT, 20 Msample/s) with a power consumption of few tens of mW. These performances are very interesting when compared to state-of-the-art software implementations and custom VLSI ones.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号