首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 156 毫秒
1.
针对地面数字视频广播(DVB-T)系统中高速FFT处理器的设计要求,提出了一种新的基16/8混合基算法及其实现结构。采用单个基16/8复用的蝶形运算单元顺序处理,并通过减少乘法器数目,有效降低了硬件消耗;运算单元内部采用“基4+基4/2”级联流水线方式,大大加快了运算速度;此外,应用对称乒乓RAM结构提高了蝶算单元的连续运算能力;并且使用改进的块浮点防溢出机制,以保证运算精度。仿真和实现结果表明该设计具有良好的性能,完全满足实际应用要求。  相似文献   

2.
快速浮点加法器设计研究   总被引:4,自引:2,他引:2  
浮点加法器处于浮点处理器的关键路径,为提高浮点加法器的速度,对浮点加法器的关键部分进行了研究:采用了预测执行,并行运算技术。引用混合加法器,前导“1”检测采用快速的LOPV电路实现,混合加法器由输出选择电路对“ lulp”操作进行合并,提高了运算速度,这些技术在双精度FPU和24位浮点DSP中应用得到了理想的效果。  相似文献   

3.
在信号频谱分析试验中,通过FPGA实现FFT。在MAX+plusⅡ系统环境下,介绍了流水线结构FFT的蝶形单元设计,详解了旋转因子的生成,通过地址产生单元和块浮点单元实现了运算结果的输出,并将其输出结果与Matlab结果进行比较。  相似文献   

4.
本文应用Cooley-TuKey经典算法,采用双加法器和“映像”存储器,获得100%的高效FFT流水蝶形运算,使1024个实数点达到1毫秒的速度,并导出其硬化的寻址、运算序列。 通常计算N点DFT,也即FFT可以用下式表示:  相似文献   

5.
高吞吐率浮点FFT处理器的FPGA实现研究   总被引:3,自引:0,他引:3       下载免费PDF全文
受浮点操作的长流水线延迟及FPGA片上RAM端口数目的限制,传统H可处理器的吞吐率通常只能达到每周期输出一个复数结果。本文用FPGA设计并实现了一种高吞吐率的IEEE754标准单精度浮点FFT处理器,通过改进蝶形计算单元的结构并重新组织FPGA片上RAM的访问,该处理器每周期平均可输出约两个复数计算结果,吞吐率约为传统FFT处理器吞吐率的两倍。对于1024点FFT变换,可在(512+10)*10=5220周期内完成。  相似文献   

6.
高性能基4快速傅里叶变换处理器的设计   总被引:4,自引:1,他引:3       下载免费PDF全文
段小东  顾立志 《计算机工程》2008,34(24):238-240
研究并设计高性能基4快速傅里叶变换(FFT)处理器。采用基4算法、流水线结构的蝶形运算单元,提高了处理速度,使芯片能在更高的时钟频率上工作。运用溢出检测状态机对每个蝶形运算单元输出的数据进行块浮点检查,确保对溢出情况进行正确判断。验证与性能评估结果表明,该FFT处理器具有较高性能。  相似文献   

7.
介绍了使用二维RAM和128个蝶形运算模块并行处理实现高速FFT(快速傅立叶变换)算法的突破性技术。该处理器可以支持最大32K的点复数FFT变换(实部和虚部各16位),转换时间为70μs,技术指标居国际先进水平。  相似文献   

8.
利用CORDIC算法在FPGA中实现可参数化的FFT   总被引:1,自引:4,他引:1  
针对在工业中越来越多的使用到的FFT,本文设计出了一种利用CORDIC算法在FPGA上实现快速FFT的方法。CORDIC实现复数乘法比普通的计算器有结构上的优势,并且采用了循环结构的CORDIC算法大大节约了硬件资源。在FFT的结构上采用了2个16点FFT的计算模块来实现蝶形计算。通过地址控制器和RAM的配合,可以完成8点至2048点的虚部实部均为16位的FFT计算。  相似文献   

9.
一种高效结构的多输入浮点加法器在FPGA上的实现   总被引:4,自引:1,他引:3  
传统的多输入浮点加法运算是通过级联二输入浮点加法器来实现的,这种结构不可避免地使运算时延和所需逻辑资源成倍增加,从而越来越难以满足需要进行高速数字信号处理的需求。本文提出了一种适合在FPGA上实现的浮点数据格式和可以在四级流水线内完成的一种高效多输入浮点加法器结构,并给出了在Xilinx公司Virtex系列芯片上的测试
试数据。  相似文献   

10.
本文介绍了浮点加法器(FPA)的基本运算步骤,归纳阐述了传统的多输入浮点加法器算法,提出了一种改进的并行多输入浮点加法器算法。采用这种改进的算法可以有效地提高运算速度并减少逻辑资源。  相似文献   

11.
Most of the scientific and engineering applications require accurate computations. Double precision floating point computations are not enough for many applications like climate modelling, computational physics, etc. Efficient design of quadruple precision floating point adder is needed for these applications. The proposed multi-mode quadruple precision floating point adder architecture supports four single precision operations in parallel, as well as two double precision operations in parallel and also supports one quadruple precision operation. Compared to existing Quadruple precision floating point adders and Dual mode Quadruple precision floating point adder, the proposed architecture can perform more computations with less area because of resource sharing among different precision operands. The proposed Multi-mode quadruple precision adder supports both normal and subnormal operations and also the exceptional case handling such as infinity, Not a Number (NaN) and zero cases. The proposed adder has been designed and implemented in both ASIC and FPGA. During ASIC implementation with 90 nm technology using the synopsis tool, the proposed Multi-mode quadruple precision floating point adder has a 38.57% smaller area compared to the existing quadruple precision floating point adder. Similarly, the proposed design reduces the area by 29.28% and 35.68% when implemented on Virtex 4 and Virtex 5 FPGAs respectively.  相似文献   

12.
In the conventional floating point multipliers, the rounding stage is usually constructed by using a high speed adder for the increment operation, increasing the overall execution time and occupying a large amount of chip area. Furthermore, it may accompany additional execution time and hardware components for renormalization which may occur by an overflow from the rounding operation. A floating-point multiplier performing addition and IEEE rounding in parallel is designed by optimizing the operational flow based on the characteristics of floating point multiplication operation. A hardware model for the floating point multiplier is proposed and its operational model is algebraically analyzed in this research. The floating point multiplier proposed does not require any additional execution time nor any high speed adder for rounding operation. In addition, the renormalization step is not required because the rounding step is performed prior to the normalization operation. Thus, performance improvement and cost-effective design can be achieved by this approach.  相似文献   

13.
前导0检测(LZD)是浮点加法运算的关键步骤,设计高速的前导0检测算法对提高浮点加法器性能具有重要意义。本文针对64位高性能微处理器浮点运算部件的应用需求,设计并实现了两种基于FFO的前导0检测算法,并对其进行了分析比较。综合结果表明,改进的并行LZD算法具有更高的检测性能,并且通过提前计算出规格化字节移位量,将前导0检测和规格化中的粗粒度移位并行化,进一步减少了整个浮点运算部件的延迟。  相似文献   

14.
Floating-point fast Fourier transform (FFT) has been widely expected in scientific computing and high-resolution imaging applications due to the wide dynamic range and high processing precision. However, it suffers high area and energy overhead problems in comparison to fixed-point implementations. To address these issues, this paper presents an area- and energy-efficient hybrid architecture for floating-point FFT computations. It minimizes the required arithmetic units and reduces the memory usage significantly by combining three different parts. The serial radix-4 butterfly (SR4BF) is used in the single-path delay commutator (SDC) part to minimize the required arithmetic units with 100% adder utilization ratio obtained. A modified single-path delay feedback (MSDF) architecture is proposed to achieve a tradeoff between arithmetic resources and memory usage by using the new half radix-4 butterfly (HR4BF) with 50% adder utilization ratio obtained. The intermediate caching buffer is modified accordingly in the MSDF part. By combining both the advantages on arithmetic units reducing and memory usage optimization in different parts, the optimized area and power are obtained without throughput loss. The logic synthesis results in a 65 nm CMOS technology show that the energy per FFT is about 331.5 nJ for 1024-point FFT computations at 400 MHz. The total hardware overhead is equivalent to 460k NAND2 gates.  相似文献   

15.
研究一种基于现场可编程门阵列实现的高速脉冲压缩处理的硬件结构。设计通用的蝶形处理单元,使其在脉冲压缩处理的3个阶段都能使用,实现了硬件的共享,提高了硬件资源的利用效率。通过可使用原位运算的并行存储器结构,使得每个时钟周期均可完成一次蝶形运算,极大地提高了处理速度。采用块浮点处理单元,兼顾定点的高速率和浮点的高精度。经过实践验证,时钟在100 MHz时完成4 096点的脉冲压缩的时间为140 μs。  相似文献   

16.
提出一种高性能并行快速傅里叶变换(FFT)处理器的设计方案,采用4个蝶形单元进行并行处理,利用改进的无冲突操作数地址映射方式,保证每个周期同时读取和写入16个数据。给出该处理器的FPGA实现,性能评测结果表明,与其他FFT处理器相比,该并行FFT处理器的性能较优,能满足实际应用需求。  相似文献   

17.
刘杰  易茂祥 《计算机工程》2010,36(1):251-252
传统加法器在处理多操作数累加时,必须进行多次循环相加操作。针对该问题设计5操作数并行加法器及其高速进位接口。电路采用多操作数并行本位相加和底层进位级联传递的方式,在一定程度上实现多操作数间的并行操作,减少相加次数。模拟结果验证了该加法器的设计合理性,证明其能缩短累加时间、提高运算效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号