首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
描述了一种高效的FFT(fast Fourier transform)流水线结构,采用这种流水线结构不仅能提高数据速率,而且能有效减小设计的规模.作为OFDM(orthogonal frequency division multiplexing)系统实现的关键部分,FFT的设计关系到整个系统的实现规模.作为应用之一,笔者在DVB-T接收机中采用了这种FFT结构,实现了对2K/8K双模式的解调.该结构还可方便地应用到其他应用FFT的场合,且易于实现多种模式的并存.  相似文献   

2.
3.
A 2.4-Gsample/s DVFS FFT Processor for MIMO OFDM Communication Systems   总被引:1,自引:0,他引:1  
This paper presents a new dynamic voltage and frequency scaling (DVFS) FFT processor for MIMO OFDM applications. By the proposed multimode multipath-delay-feedback (MMDF) architecture, our FFT processor can process 1-8-stream 256-point FFTs or a high-speed 256-point FFT in two processing domains at minimum clock frequency for DVFS operations. A parallelized radix-24 FFT algorithm is also employed to save the power consumption and hardware cost of complex multipliers. Furthermore, a novel open-loop voltage detection and scaling (OLVDS) mechanism is proposed for fast and robust voltage management. With these schemes, the proposed FFT processor can operate at adequate voltage/frequency under different configurations to support the power-aware feature. A test chip of the proposed FFT processor has been fabricated using UMC 90 nm single-poly nine-metal CMOS process with a core area of 1.88 times1.88 mm2 . The SQNR performance of this FFT chip is over 35.8 dB for QPSK/16-QAM modulation. Power dissipation of 2.4 Gsample/s 256-point FFT computations is about 119.7 mW at 0.85 V. Depending on the operation mode, power can be saved by 18%-43% with voltage scaling in TT corner.  相似文献   

4.
通过对通用算法对比和分析,介绍了一种利用混合基、多块存储器的原位算法构成、能够实现持续处理的多模式FFT处理器的设计和实现。该FFT处理器采用类似块浮点的数据收缩方法,结构简单、速度高、性能好、功耗低,不仅满足高速计算的要求,而且减小了硬件实现的复杂度、易于FPGA实现,因此可以适用于多载波OFDM调制系统中。  相似文献   

5.
FFT algorithms have memory access patterns that prevent many architectures from achieving high computational utilization, particularly when parallel processing is required to achieve the desired levels of performance. Starting with a highly efficient hybrid linear algebra/FFT core, we co-design the on-chip memory hierarchy, on-chip interconnect, and FFT algorithms for a multicore FFT processor. We show that it is possible to to achieve excellent parallel scaling while maintaining power and area efficiency comparable to that of the single-core solution. The result is an architecture that can effectively use up to 16 hybrid cores for transform sizes that can be contained in on-chip SRAM. When configured with 12MiB of on-chip SRAM, our technology evaluation shows that the proposed 16-core FFT accelerator should sustain 388 GFLOPS of nominal double-precision performance, with power and area efficiencies of 30 GFLOPS/W and 2.66 GFLOPS/mm2, respectively.  相似文献   

6.
This paper presents architectures for supporting dynamic data scaling in pipeline fast Fourier transforms (FFTs), suitable when implementing large size FFTs in applications such as digital video broadcasting and digital holographic imaging. In a pipeline FFT, data is continuously streaming and must, hence, be scaled without stalling the dataflow. We propose a hybrid floating-point scheme with tailored exponent datapath, and a co-optimized architecture between hybrid floating point and block floating point (BFP) to reduce memory requirements for 2-D signal processing. The presented co-optimization generates a higher signal-to-quantization-noise ratio and requires less memory than for instance convergent BFP. A 2048-point pipeline FFT has been fabricated in a standard-CMOS process from AMI Semiconductor (Lenart and Owall, 2003), and a field-programmable gate array prototype integrating a 2-D FFT core in a larger design shows that the architecture is suitable for image reconstruction in digital holographic imaging  相似文献   

7.
We propose an energy-balanced allocation of a real-time application onto a single-hop cluster of homogeneous sensor nodes connected with multiple wireless channels. An epoch-based application consisting of a set of communicating tasks is considered. Each sensor node is equipped with discrete dynamic voltage scaling (DVS). The time and energy costs of both computation and communication activities are considered. We propose both an Integer Linear Programming (ILP) formulation and a polynomial time 3-phase heuristic. Our simulation results show that for small scale problems (with ≤10 tasks), up to 5x lifetime improvement is achieved by the ILP-based approach, compared with the baseline where no DVS is used. Also, the 3-phase heuristic achieves up to 63% of the system lifetime obtained by the ILP-based approach. For large scale problems (with 60–100 tasks), up to 3.5x lifetime improvement can be achieved by the 3-phase heuristic. We also incorporate techniques for exploring the energy-latency tradeoffs of communication activities (such as modulation scaling), which leads to 10x lifetime improvement in our simulations. Simulations were further conducted for two real world problems – LU factorization and Fast Fourier Transformation (FFT). Compared with the baseline where neither DVS nor modulation scaling is used, we observed up to 8x lifetime improvement for the LU factorization algorithm and up to 9x improvement for FFT.  相似文献   

8.
Large-scale single-frequency networks are now being considered in Europe as very promising network topologies to achieve drastic savings in spectrum usage for digital terrestrial television transmission. Such networks are possible using the COFDM system, with large guard intervals (more than 200 μs) to absorb long echoes. In order to limit the spectral efficiency loss to about 20%, very long size fast Fourier transforms (up to 8 K complex points) have to be performed in real time for the demodulation of every COFDM symbol (every 1 ms). This paper presents the first VLSI single chip dedicated to the computation of direct or inverse fast Fourier transforms of up to 8192 complex points. Due to its pipelined architecture, it can perform an 8 K FFT every 400 μs and a 1 K FFT every 50 μs. All the storage is onchip, so that no external memories are required. A new internal result scaling technique, called convergent block floating point, has been introduced in order to minimize the required storage for a given quantization noise, The chip, 1 cm2 large with 1.5 million transistors, has been designed in a 3.3 V-0.5 μm triple-level metal CMOS process and is fully functional. The 8 K complex FFT function could therefore be introduced in the coming years in digital terrestrial TV receivers at low cost  相似文献   

9.
In this brief, a high-throughput and low-complexity fast Fourier transform (FFT) processor for wideband orthogonal frequency division multiplexing communication systems is presented. A new indexed-scaling method is proposed to reduce both the critical-path delay and hardware cost by employing shorter wordlength. Together with the mixed-radix multipath delay feedback structure, the proposed FFT processor can achieve very high throughput with low hardware cost. From analysis, it is shown that the proposed indexed-scaling method can save at least 11% memory utilizations compared to other state-of-the-art scaling algorithms. Also, a test chip of a 1.2 Gsample/s 2048-point FFT processor has been designed using UMC 90-nm 1P9M process with a core area of 0.97 mm2. The signal-to-quantization-noise ratio (SQNR) performance of this test chip is over 32.7 dB to support 16-QAM modulation and the power consumption is about 117 mW at 300 MHz. Compared to the fixed-point FFT processors, about 26% area and 28% power can be saved under the same throughput and SQNR specifications.  相似文献   

10.
Fast Fourier transform (FFT) plays an important role in the orthogonal frequency division multiplexing (OFDM) communication systems. In this paper, we propose an area-efficient design of variable-length FFT processor which can perform various FFT lengths of 512/1,024/2,048/4,096/8,192 points used in OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T) and digital video broadcasting-handheld (DVB-H). To reduce computational complexity and chip area, we develop a new variable-length FFT architecture by devising a mixed-radix algorithm that consist of radix-2, radix-22 and radix-2/4/8 algorithms and optimizing the realization by substructure sharing. Based on this architecture, an area-efficient design of variable-length FFT processor is presented. By synthesized using the UMC 0.18 μm process, the area of the processor is 2.9 mm2 and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage.
Shuenn-Shyang WangEmail:
  相似文献   

11.
DVB-T是基于OFDM技术的欧洲地面数字电视标准.在DVB-T中,有2K和8K两种传输模式以及1/32,1/16,1/8和1/4等4种保护间隔分数值,总共有8种组合模式.在接收机上电后开始工作时,必须首先进行模式检测,得出系统所采用的传输模式和保护间隔分数值.提出了一种基于下采样的传输模式和GI分数值检测算法,并用F...  相似文献   

12.
The complex-logarithmic number system (CLNS), which represents each complex point in log/polar coordinates, may be practical to implement the Fast Fourier Transform (FFT). The roots of unity needed by the FFT have exact representations in CLNS and do not require a ROM.We present an error analysis and simulation results for a radix-two FFT that compares a rectangular fixed-point representation of complex numbers to CLNS. We observe that CLNS saves 9–12 bits in word-size for 256–1024 point FFTs compared to the fixed-point number system while producing comparable accuracy.The consequence of the word-size advantage is that the number of full adders required for CLNS is significantly smaller than for an equivalent fixed-point implementation. The major cost of CLNS is the memory, which unlike conventional LNS, is addressed by both real and imaginary parts. Table-reduction techniques can mitigate this. The simplicity of the CLNS approach requires significantly fewer full adders, which pays for some or all of the extra memory. In applications needing the magnitude of the complex parts, such as a power spectrum, the CLNS approach can actually require less memory than the conventional approach.  相似文献   

13.
Symbol timing offset (STO) can result in intersymbol interference (ISI) and a rotated phase which value is proportional to the subcarrier index at the FFT output in an OFDM receiver. In order to avoid ISI, the FFT window start position has to be put in advance of the estimated point obtained by coarse STO estimation algorithms. But a large number of forward-shift samples will deteriorate the estimation of data subcarrier channels requiring interpolation from pilot subcarrier channels due to the phase rotation caused by a residual STO. In this paper we analyze the influence of STO on channel interpolation and propose a new compensation method for channel correction with STO. From the performance analysis of simulation results in the DVB-T application, the new algorithm not only has a better performance but also is attractive in using a simple residual STO estimation method such that few pilots are required for fine STO estimation compared to conventional approaches.   相似文献   

14.
This paper presents an area-efficient algorithm for the pipelined processing of fast Fourier transform (FFT). The proposed algorithm is to decompose a discrete Fourier transform (DFT) into two balanced sub-DFTs in order to minimize the total number of twiddle factors to be stored into tables. The radix in the proposed decomposition is adaptively changed according to the remaining transform length to make the transform lengths of sub-DFTs resulting from the decomposition as close as possible. An 8192-point pipelined FFT processor designed for digital video broadcasting-terrestrial (DVB-T) systems saves 33% of general multipliers and 23% of the total size of twiddle factor tables compared to a conventional pipelined FFT processor based on the radix-22 algorithm. In addition to the decomposition, several implementation techniques are proposed to reduce area, such as a simple index generator of twiddle factor and add/subtract units combined with the two's complement operation  相似文献   

15.
16.
A dual-operating-voltage scheme (5 V for peripheral circuits and 3.3 V for the memory array) is shown to be the best approach for a single 5-V 16-Mb DRAM (dynamic random-access memory). This is because the conventional scaling rule cannot apply to DRAM design due to the inherent DRAM word-line boosting feature. A novel internal voltage generator to realize this approach is presented. Its features are the switching of two reference voltages, a driver using a PMOS-load differential amplifier, and the word-line boost based on the regulated voltage, which can ensure a wider memory margin than conventional circuits. This approach is applied to an experimental 16-Mb DRAM. A 0.5% supply-voltage dependency and 30-ns recovery time are achieved  相似文献   

17.
廖露华  陈伟 《现代传输》2004,43(3):74-76
本文研究数字地面电视DVB-T的OFDM传输系统接收端的同步。根据OFDM的原理特点和DVB-T标准,设计同步方案。该方案采用多载波(OFDM)系统时域插在循环保护间隔携带的冗余信息进行粗符号和分数频率偏移估计。该方案利用OFDM系统时域插入的保护间隔前缀所携带的冗余信息,进行粗符号同步和分数频率偏移估计。FFT后,再利用频域插入的连续导频进行整数频偏估计,及利用分散导频通过估计信道冲击响应,实现精符号同步。仿真表明,同步设计在加性高斯白噪声信道能达到最佳性能,在慢衰落Rayleigh信道下也有较好的性能。  相似文献   

18.
在DVB-T 2K系统中常规同步算法均采用保护间隔和导频实现,其移动速度不能超过100k/h.本文提出一种低复杂度的利用叠加的、极弱能量的复巴克码和保护间隔进行帧同步算法.仿真表明:对于150km/h的高速移动DVB-T系统,该算法能极快地获得良好的帧同步性能;同时,该叠加的复巴克码对系统性能的影响可以忽略不计.  相似文献   

19.
In this paper, we present a novel fixed-point 16-bit word-width 64-point FFT/IFFT processor developed primarily for the application in an OFDM-based IEEE 802.11a wireless LAN baseband processor. The 64-point FFT is realized by decomposing it into a two-dimensional structure of 8-point FFTs. This approach reduces the number of required complex multiplications compared to the conventional radix-2 64-point FFT algorithm. The complex multiplication operations are realized using shift-and-add operations. Thus, the processor does not use a two-input digital multiplier. It also does not need any RAM or ROM for internal storage of coefficients. The proposed 64-point FFT/IFFT processor has been fabricated and tested successfully using our in-house 0.25-/spl mu/m BiCMOS technology. The core area of this chip is 6.8 mm/sup 2/. The average dynamic power consumption is 41 mW at 20 MHz operating frequency and 1.8 V supply voltage. The processor completes one parallel-to-parallel (i.e., when all input data are available in parallel and all output data are generated in parallel) 64-point FFT computation in 23 cycles. These features show that though it has been developed primarily for application in the IEEE 802.11a standard, it can be used for any application that requires fast operation as well as low power consumption.  相似文献   

20.
In high-performance DSP systems, the memory bandwidth can be improved using high-density interconnect technology and appropriate memory mapping. High-density MCM and flip-chip solder bump technology is used to achieve a system with an I/O bandwidth of 100 Gb/s/cm2 die. The use of DRAMs in these systems usually make the performance of these systems poor, and some algorithms make it difficult to fully utilize the available memory bandwidth. This paper presents the design of a fast Fourier transform (FFT) engine that gives SRAM-like performance in a DRAM-based system. It uses almost 100% of the available burst-mode memory bandwidth. This FFT engine can compute a million-point FFT in 1.31 ms at a sustained computation rate of 8.64 /spl times/ 10/sup 10/ floating-point operations per second (FLOPS). This is at least an order of magnitude better than conventional systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号