首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A parallel-data VLSI architecture for computation of the fast Fourier transform (FFT) is described. The processor is based on a computationally efficient vector rotate algorithm. Use of a 2-dimensional pipeline configuration allows a radix-2 butterfly operation to be performed once every system clock cycle (250 ns) to generate real or imaginary transform components. The architecture is considered to be a computationally efficient VLSI approach for high-bandwidth computation of the FFT. The design and performance of an 8-bit FFT butterfly processor are described.  相似文献   

2.
设计了一个新的无存储器的基-2 1024点FFT旋转因子产生电路.这个旋转因子产生电路用若干逻辑模块来产生数据,然后用这些数据合成所需要的旋转因子.用Synopsys Power Compiler进行功耗分析表明,用TSMC 0.25μm CMOS工艺综合出来的电路在50MHz时的功耗为2mW.这种旋转因子产生电路非常适合用于低功耗的设计中,尤其是移动通信和其他手持设备中.  相似文献   

3.
This paper addresses the design of power efficient dedicated structures of Radix-2 Decimation in Time (DIT) pipelined butterflies, aiming the implementation of low power Fast Fourier Transform (FFT), using adder compressors, with a new XOR gate topology. In the FFT computation, the butterflies play a central role, since they allow calculation of complex terms. In this calculation, involving multiplications of input data with appropriate coefficients, the optimization of the butterfly can contribute for the reduction of power consumption of FFT architectures. In this paper, different and dedicated structures for the 16 bit-width pipelined Radix-2 DIT butterfly, running at 100 MHz, are implemented, where the main goal is to minimize both the number of real multipliers and the critical path of the structures. This is done by changing the structure of the complex multipliers and applying them into the butterflies. For logic synthesis of the implemented butterflies it was used Cadence Encounter RTL Compiler tool with XFAB MOSLP 0.18 μm library. Area and power consumption results are presented for the synthesized butterflies. Regarding power consumption, switching activity analysis is performed using 10,000 inputs vectors at inputs of the butterflies. The main results show that when combining the use of pipeline approach and the use of efficient adder compressors, with a new XOR gate topology, the power consumption of the butterflies is significantly reduced.  相似文献   

4.
《现代电子技术》2015,(21):145-148
针对串行进位加法器存在的延时问题,采用一种基于Sklansky结构的并行前缀加法器,通过对并行前缀加法器各个模块进行优化,设计实现了一个24位并行前缀加法器。通过与24位串行进位加法器进行延时比较,结果表明,Sklansky并行前缀结构的加法器,能有效提高运算速度。  相似文献   

5.
设计了一个新的无存储器的基-2 1024点FFT旋转因子产生电路.这个旋转因子产生电路用若干逻辑模块来产生数据,然后用这些数据合成所需要的旋转因子.用Synopsys Power Compiler进行功耗分析表明,用TSMC 0.25μm CMOS工艺综合出来的电路在50MHz时的功耗为2mW.这种旋转因子产生电路非常适合用于低功耗的设计中,尤其是移动通信和其他手持设备中.  相似文献   

6.
陈海燕  杨超  刘胜  刘仲 《电子学报》2016,44(2):241-246
随着SIMD(Single Instruction Multiple Data stream)结构DSP(Digital Signal Processor)片上集成了越来越多的处理单元,并行访存的灵活性及带宽效率对实际运算性能的影响越来越大.本文详细分析了一般SIMD结构DSP中基2 FFT(Fast Fourier Transform)并行算法面临的访存问题,采用简单的部分地址异或逻辑完成SIMD并行访存地址转换,实现了FFT运算的无冲突SIMD并行访存;提出了几种带特殊混洗模式的向量访存指令,可完全消除SIMD结构下基2 FFT运算时需要的额外混洗指令操作.最后将其应用于某16路SIMD数字信号处理器YHFT-Matrix2中向量存储器VM的优化设计.测试结果表明,采用该SIMD并行存储结构优化的VM以增加18%的硬件开销实现了FFT运算全流水无冲突并行访存和100%并行访存带宽利用率;相比优化前的设计,不同点数FFT运算可获得1.32~2.66的加速比.  相似文献   

7.
针对当前数字信号处理领域对快速傅里叶变换应用的广泛需求,在对算法原理分析的基础上,给出了8点基-2按时间抽选FFT处理器的实现方案;并在Xilinx xc3s 1500系列芯片上进行综合,通过Modelsim SE6.0对程序进行了仿真。实验结果表明,该处理器功能实现正确,并且具有较高的运算速度和精度。  相似文献   

8.
Sharma  A. 《Electronics letters》2004,40(9):536-537
A fast Fourier transform (FFT) Radix-2 based approach using a gamma-ray spectrometer for the identification of radioactive mineral is presented. The proposed method reduces the complexity of diagnosis associated with the spectrometer by transferring time-domain analysis to frequency-domain analysis. The operation of PDA/DSP is discussed pragmatically.  相似文献   

9.
A dynamic scaling FFT processor for DVB-T applications   总被引:1,自引:0,他引:1  
This paper presents an 8192-point FFT processor for DVB-T systems, in which a three-step radix-8 FFT algorithm, a new dynamic scaling approach, and a novel matrix prefetch buffer are exploited. About 64 K bit memory space can be saved in the 8 K point FFT by the proposed dynamic scaling approach. Moreover, with data scheduling and pre-fetched buffering, single-port memory can be adopted without degrading throughput rate. A test chip for 8 K mode DVB-T system has been designed and fabricated using 0.18-/spl mu/m single-poly six-metal CMOS process with core area of 4.84 mm/sup 2/. Power dissipation is about 25.2 mW at 20 MHz.  相似文献   

10.
A VLSI array processor for 16-point FFT   总被引:1,自引:0,他引:1  
An implementation of a two-dimensional array processor for fast Fourier transform (FFT) using a 2-μm CMOS technology is presented. The array processor, which is dedicated to 16-point FFT, implements a 4×4 mesh array of 16 processing elements (PEs) working in parallel. Design considerations in both the chip level and the PE level are examined. A layout design methodology based on bit-slice units (BSUs) results in a very simple design, easy debugging, and a regular interconnection scheme through abutment. It contains about 48,000 transistors on an area of 53.52 mm2, excluding the 83-pad area, and operation is on a 15-MHz clock. The array processor performs 24.6 million complex multiplications per second, and computes a 16-point FFT in 3 μs  相似文献   

11.
Reversible logic has received much attention in recent years when calculation with minimum energy consumption is considered. Especially, interest is sparked in reversible logic by its applications in some technologies, such as quantum computing, low-power CMOS design, optical information processing and nanotechnology. This article proposes two new reversible logic gates, ZRQ and NC. The first gate ZRQ not only implements all Boolean functions but also can be used to design optimised adder/subtraction architectures. One of the prominent functionalities of the proposed ZRQ gate is that it can work by itself as a reversible full adder/subtraction unit. The second gate NC can complete overflow detection logic of Binary Coded Decimal (BCD) adder. This article proposes two approaches to design novel reversible BCD adder using new reversible gates. A comparative result which is presented shows that the proposed designs are more optimised in terms of number of gates, garbage outputs, quantum costs and unit delays than the existing designs.  相似文献   

12.
A radix-8 wafer scale FFT processor   总被引:2,自引:0,他引:2  
Wafer Scale Integration promises radical improvements in the performance of digital signal processing systems. This paper describes the design of a radix-8 systolic (pipeline) fast Fourier transform processor for implementation with wafer scale integration. By the use of the radix-8 FFT butterfly wafer that is currently under development, continuous data rates of 160 MSPS are anticipated for FFTs of up to 4096 points with 16-bit fixed point data.  相似文献   

13.
14.
文中设计了一款64点基-4FFT处理器,用改进的CORDIC (MVR-CORDIC)处理单元代替常规FFT处理器中的复数乘法器,改进的CORDIC处理单元在保证SQNR性能下,仅用极少次数的移位加法运算即可完成一次复数乘法,缩减了完成一次基本蝶形运算的时间并减小了面积开销。该FFT处理器结构采用两块独立的RAM,并对中间数据作“乒-乓”式存储操作以节省数据存储时间,从而提高完成一次FFT运算的速度。所设计的FFT处理器通过FPGA进行验证,结果表明平均完成一次64点FFT运算仅需要不到1μs。  相似文献   

15.
An n2n round-robin arbiter (RRA) searches its n inputs for a 1, starting from the highest-priority input. It picks the first 1 and outputs its index in one-hot encoding. RRA aims to be fair to its inputs and maintains fairness by simply rotating the input priorities, i.e., the last arbitrated input becomes the lowest-priority input. Arbiters are used to multiplex the usage of shared resources among requestors as well as in dispatch logic where the purpose is load balancing among multiple resources. Today, arbiters have hundreds of ports and usually need to run at very high clock speeds. This article presents a new gate-level RRA circuit called Thermo Coded-Parallel Prefix Arbiter (TC-PPA) that scales to any number of requestors. It uses parallel prefix network topologies (borrowed from fast carry lookahead adders) to generate a thermometer-coded pointer, thus greatly reducing critical path. Code generators were written not only for TC-PPA but also for the 5 highly competitive circuits in the literature (9 including their variants), and a rich set of timing/area results were obtained using a standard-cell based logic synthesis flow with a novel iterative strategy based on binary search. Synthesis runs include results with wire-load and without. Results show that for 54 or more ports (except 256) TC-PPA offers the best timing (lowest latency) as well as competitive area. Contributions also include transaction-level simulations that show when pipelining is used to boost clock rate, latency and input FIFO sizes are adversely affected, and hence pipelining cannot be indiscriminately exploited to trim clock period.  相似文献   

16.
This research work focuses on the design of a high-resolution fast Fourier transform (FFT)/inverse FFT (IFFT) processors for constraints analysis purpose. Amongst the major setbacks associated with such high-resolution FFT processors are the high-power consumption resulting from the structural complexity and computational inefficiency of floating-point calculations. As such, a parallel pipelined architecture was proposed to statically scale the resolution of the processor to suite adequate trade-off constraints. The quantisation was applied to provide an approximation to address the finite word-length constraints of digital signal processing. An optimum operating mode was proposed, based on the signal-to-quantisation-noise ratio (SQNR) as well as the statistical theory of quantisation, to minimise the trade-off issues associated with selecting the most application-efficient floating-point processing capability in contrast to their resolution quality.  相似文献   

17.
We propose a new VLSI architecture for an FFT processor. Our architecture uses few processing elements and can be laid out in a mesh-interconnected pattern. We show how to compute the discrete Fourier transform at n points with an optimal speed-up as long as the memory is large enough. The control is shown to be simple and easily implementable in VLSI.  相似文献   

18.
A vector-quantization (VQ) processor system has been developed aiming at real-time compression of motion pictures using a 0.6-μm triple-metal CMOS technology. The chip employs a fully parallel single-instruction, multiple-data architecture having a two-stage pipeline. Each pipeline segment consists of 19 cycles, thus enabling the execution of a single VQ operation in only 19 clock cycles. As a result, it has become possible to encode a full-color picture of 640×480 pixels in less than 33 ms, i.e., the real-time compression of moving pictures has become available. The chip is scalable up to eight-chip master-slave configuration in conducting fully parallel search for 2-K template vectors. The chip operates at 17 MHz with a power dissipation of 0.29 W under a power-supply voltage of 3.3 V  相似文献   

19.
Detailed investigations have been carried out on a Josephson parallel full adder as an example of a functional circuit using Josephson junctions. This circuit can be constructed with fewer devices as compared with conventional Josephson adders. Two-junction interferometers (d.c.-SQUIDs) are utilized as switching elements of the circuit. Only (2N+1) d.c.-SQUIDs are required for construction of an N bit circuit. Discussions focus on design theory of the full adder circuit. An experimental 4 bit circuit operation is also demonstrated.  相似文献   

20.
本文通过对混合基4/2 FFT算法的分析,在优化采样数据、旋转因子存储及读取方法的基础上,提出了将N=2m点,m为奇、偶两种情况的地址产生统一于同一函数的算法,并设计了简单的插入值产生及快速插入位置控制电路,从而用一个计数器、同一套地址产生硬件,通过简单的开关模式控制,可实现任意长度FFT变换的地址产生单元,该地址产生单元在一个时钟周期内产生读取所需旋转因子及并行访存4个操作数的地址.本文设计的FFT处理器每周期完成一个基4或2个基2蝶式运算,在吞吐率高、资源少的基础上实现了处理长度可编程的灵活性,同时避免了旋转因子重复读取,降低功耗.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号