期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

邓学禹《电讯技术》2005,45(2):188-191

为了提高快速傅里叶变换(FFT)处理数据的实时性,本文利用现场可编程阵列(FPGA)逻辑资源丰富、运算速度快的特点以及FFT算法的分级特性,实现了高速、高阶FFT的流水线工作方式设计。通过本文介绍的设计方法,在Xilinx公司Virtex-II系列FPGA上实现了工作频率50MHz以上、数据流水输入、输出的1 024点按时间抽取FFT。相似文献

2.

基于FPGA高精度浮点运算器的FFT设计与仿真 总被引：1，自引：0，他引：1

张雪姣伍萍辉《电子科技》2011,24(12):88-90

基于IEEE浮点表示格式及FFT算法,提出一种基2FFT的FPGA方法,完成了基于FPGA高精度浮点运算器的FFT的设计。利用VHDL语言描述了蝶形运算过程及地址产生单元,其仿真波形基本能正确的表示输出结果相似文献

3.

一种基于FPGA的高速短时傅里叶实现

夏明赟蒋涛《通信技术》2012,45(7):113-115

短时傅里叶变换(STFT)由于其算法简单、处理时间短及易于实现等优点,因此其在图像处理、语音分析、信号检测及参数估计等领域获得越来越多应用。通过分析短时傅里叶变换算法原理,设计了一种基于现场可编程逻辑器件(FPGA)的高速短时傅里叶实现结构,该结构充分利用蝶形单元运算特点,在满足时间分辨率及频率分辨率的基础上降低了运算复杂度,并在高速率运行时钟下节省了硬件资源。相似文献

4.

一种低复杂度的通用FFT处理器

周敏余松煜归琳《电视技术》2005,(8):35-37

利用R22SDF算法的低复杂度的特点,在其基础上演变出一种通用的FFT算法.该方法可适用于所有的2n点FFT运算.该算法采用流水线结构,以满足数据实时性处理的要求. 相似文献

5.

按时间抽取的基2FFT算法分析及MATLAB实现

张登奇李宏民李丹《电子技术》2011,38(2):75-77

DFT是一种应用广泛的数学变换工具,MATLAB是一款功能强大的科学计算语言.MATLAB提供的FFT函数解决TDFT的快速计算问题,但由于它是内建函数而不能了解到软件实现的过程.文章以按时间抽取的基2FFT算法为例,根据快速傅里叶变换的原理和规律,绘出了算法实现的程序框图,列出了MATLAB环境下软件实现的程序,建立... 相似文献

6.

OFDM系统中流水线型FFT(IFFT)处理器设计

钟会新戴宇杰张小兴吕英杰《电视技术》2009,33(Z1)

通过流水线结构和乒乓RAM相结合,改进了时域抽取的Radix-4算法,实现了一种适合于OFDM系统的高效流水线型FFT(IFFTI)处理器的VLSI设计.在时钟频率125 MHz下,完成一次1024点16 bit位长的复数FFT需时49.57μs. 相似文献

7.

FFT算法的一种FPGA设计

陆旦前陈建平陈晓勇《现代电子技术》2007,30(6):178-181

在分析了快速傅里叶算法理论的基础上,提出了一种频率抽取基4FFT的FPGA设计方案,针对现有FFT的FPGA实现过程中蝶形运算需要频繁乘以多个旋转因子提出了改进方法,减少了旋转因子的乘法次数和存储空间,加快了蝶形运算的速度,设计的地址映射方法,无需运算即可得到所需数据的存放地址,并结合采用乒乓结构和流水线方式,来提高快速傅里叶变换(FFT)FPGA实现的速度,为实现FFT算法提供了一定的参考价值。相似文献

8.

高吞吐率双模浮点可重构FFT处理器设计实现

魏星黄志洪杨海钢《电子与信息学报》2018,40(12):3042-3050

高吞吐浮点可灵活重构的快速傅里叶变换(FFT)处理器可满足尖端雷达实时成像和高精度科学计算等多种应用需求。与定点FFT相比,浮点运算复杂度更高,使得浮点型FFT的运算吞吐率与其实现面积、功耗之间的矛盾问题尤为突出。鉴于此,为降低运算复杂度,首先将大点数FFT分解成若干个小点数基2^k 级联子级实现,提出分别针对128/256/512/1024/2048点FFT的优化混合基算法。同时,结合所提出同时支持单通道单精度和双通道半精度两种浮点模式的新型融合加减与点乘运算单元,首次提出一款高吞吐率双模浮点可变点FFT处理器结构,并在28 nm标准CMOS工艺下进行设计并实现。实验结果表明,单通道单精度和双通道半精度浮点两种模式下的运算吞吐率和输出平均信号量化噪声比分别为3.478 GSample/s, 135 dB和6.957 GSample/s, 60 dB。归一化吞吐率面积比相比于现有其他浮点FFT实现可提高约12倍。相似文献

9.

高速64点FFT芯片设计技术

赵梅丁晓磊朱恩《电子工程师》2007,33(3):13-17

针对高速64点FFT(快速傅里叶变换)处理芯片的实现,分析了FFT运算原理,并根据FFT算法原理介绍了改进的FFT运算流图。介绍了FFT处理器系统的各模块的功能划分,并根据FFT处理器结构及其特殊寻址方式,采用Verilog HDL对处理器系统的控制器、双数据缓存、地址生成器、蝶形运算单元以及I/O控制等模块进行了RTL(寄存器传输级)设计,并在ModelSim中对各模块以及整个系统进行功能仿真和验证,给出了部分关键模块的仿真波形图。设计中,注重从硬件实现以及电路的可综合性等角度进行RTL电路设计,以确保得到与期望性能相符的硬件电路。相似文献

10.

一种新型高效的FFT处理器设计及应用

傅亮徐元欣张明《电视技术》2005,(8):32-34

描述了一种高效的FFT(fast Fourier transform)流水线结构,采用这种流水线结构不仅能提高数据速率,而且能有效减小设计的规模.作为OFDM(orthogonal frequency division multiplexing)系统实现的关键部分,FFT的设计关系到整个系统的实现规模.作为应用之一,笔者在DVB-T接收机中采用了这种FFT结构,实现了对2K/8K双模式的解调.该结构还可方便地应用到其他应用FFT的场合,且易于实现多种模式的并存. 相似文献

11.

A CMOS floating-point vector-arithmetic unit

Timmermann D. Rix B. Hahn H. Hosticka B.J. 《Solid-State Circuits, IEEE Journal of》1994,29(5):634-639

This work describes a floating-point arithmetic unit based on the CORDIC algorithm. The unit computes a full set of high level arithmetic and elementary functions: multiplication, division, (co)sine, hyperbolic (co)sine, square root, natural logarithm, inverse (hyperbolic) tangent, vector norm, and phase. The chip has been integrated in 1.6 μm double-metal n-well CMOS technology and achieves a normalized peak performance of 220 MFLOPS 相似文献

12.

Fast floating-point normalisation unit realised using NOR planes

《Electronics letters》2002,38(16):857-858

A new floating-point (FP) normalisation unit scheme is presented, that achieves enhanced performance by merging a leading zero counter (LZC) and a normalisation shifter. The LZC and the shift decoder are combined by using NOR planes to generate control signals directly to the normalisation shifter. The chip has been fabricated with a five-metal 0.18 μm CMOS process and performs the 64 bit FP normalisation within 1.4 ns 相似文献

13.

Design of a coarse-grained reconfigurable architecture with floating-point support and comparative study

Manhwee Jo Dongwook Lee Kyuseung Han Kiyoung Choi 《Integration, the VLSI Journal》2014

With a huge increase in demand for various kinds of compute-intensive applications in electronic systems, researchers have focused on coarse-grained reconfigurable architectures because of their advantages: high performance and flexibility. This paper presents FloRA, a coarse-grained reconfigurable architecture with floating-point support. A two-dimensional array of integer processing elements in FloRA is configured at run-time to perform floating-point operations as well as integer operations. Fabricated using 130 nm process, the total area overhead due to additional hardware for floating-point operations is about 7.4% compared to the previous architecture which does not support floating-point operations. The fabricated chip runs at 125 MHz clock frequency and 1.2 V power supply. Experiments show 11.6× speedup on average compared to ARM9 with a vector-floating-point unit for integer-only benchmark programs as well as programs containing floating-point operations. Compared with other similar approaches including XPP and Butter, the proposed architecture shows much higher performance for integer applications, while maintaining about half the performance of Butter for floating-point applications. 相似文献

14.

A 320 MFLOPS CMOS floating-point processing unit for superscalarprocessors

Ide N. Fukuhisa H. Kondo Y. Yoshida T. Nagamatsu M. Junji M. Yamazaki I. Ueno K. 《Solid-State Circuits, IEEE Journal of》1993,28(3):352-361

A CMOS pipelined floating-point processing unit (FPU) for superscalar processors is described. It is fabricated using a 0.5 μm CMOS triple-metal-layer technology on a 61 mm² die. The FPU has two execution modes to meet precise scientific computations and real-time applications. It can start two FPU operations in each cycle, and this achieves a peak performance of 160 MFLOPS double or single precision with an 80 MHz clock. Furthermore, the original computation mode, twin single-precision computation, double the peak performance and delivers 320 MFLOPS single precision. Its full bypass reduces the latency of operations, including load and store, and achieves an effective throughput even in nonvectorizable computations. An out-of-order completion is provided by using a new exception prediction method and a pipeline stall technique 相似文献

15.

FPU加法器的设计与实现

田祎颜军《电子设计工程》2012,20(12):13-15,20

浮点运算器的核心运算部件是浮点加法器,它是实现浮点指令各种运算的基础,其设计优化对于提高浮点运算的速度和精度相当关键。文章从浮点加法器算法和电路实现的角度给出设计方法,通过VHDL语言在QuartusII中进行设计和验证,此加法器通过状态机控制运算,有效地降低了功耗,提高了速度,改善了性能。相似文献

16.

A simplified synthesis of transmission lines with a tree structure 总被引：1，自引：0，他引：1

D. Zhou S. Su F. Tsui D. S. Gao J. S. Cong 《Analog Integrated Circuits and Signal Processing》1994,5(1):19-30

The limiting factor for high-performance systems is being set by interconnection delay rather than transistor switching speed. The advances in circuits speed and density are placing increasing demands on the performance of interconnections, for example chip-to-chip interconnection on multichip modules. To address this extremely important and timely research area, we analyze in this paper the circuit property of a generic distributedRLC tree which models interconnections in high-speed IC chips. The presented result can be used to calculate the waveform and delay in anRLC tree. The result on theRLC tree is then extended to the case of a tree consisting of transmission lines. Based on an analytical approach a two-pole circuit approximation is presented to provide a closed form solution. The approximation reveals the relationship between circuit performance and the design parameters which is essential to IC layout designs. A simplified formula is derived to evaluate the performance of VLSI layout. 相似文献

17.

A fully pipelined single-precision floating-point unit in the synergistic processor element of a CELL processor

Hwa-Joon Oh Mueller S.M. Jacobi C. Tran K.D. Cottier S.R. Michael B.W. Nishikawa H. Totsuka Y. Namatame T. Yano N. Machida T. Dhong S.H. 《Solid-State Circuits, IEEE Journal of》2006,41(4):759-771

The floating-point unit (FPU) in the synergistic processor element (SPE) of a CELL processor is a fully pipelined 4-way single-instruction multiple-data (SIMD) unit designed to accelerate media and data streaming with 128-bit operands. It supports 32-bit single-precision floating-point and 16-bit integer operands with two different latencies, six-cycle and seven-cycle, with 11 FO4 delay per stage. The FPU optimizes the performance of critical single-precision multiply-add operations. Since exact rounding, exceptions, and de-norm number handling are not important to multimedia applications, IEEE correctness on the single-precision floating-point numbers is sacrificed for performance and simple design. It employs fine-grained clock gating for power saving. The design has 768K transistors in 1.3 mm/sup 2/, fabricated SOI in 90-nm technology. Correct operations have been observed up to 5.6 GHz with 1.4 V and 56/spl deg/C, delivering 44.8 GFlops. Architecture, logic, circuits, and integration are codesigned to meet the performance, power, and area goals. 相似文献

18.

2.44-GFLOPS 300-MHz floating-point vector-processing unit forhigh-performance 3D graphics computing

Ide N. Hirano M. Endo Y. Yoshioka S. Murakami H. Kunimatsu A. Sato T. Kamei T. Okada T. Suzuoki M. 《Solid-State Circuits, IEEE Journal of》2000,35(7):1025-1033

A vector unit for high-performance three-dimensional graphics computing has been developed. We implement four floating-point multiply-accumulate units, which execute multiply-add operations with one throughput; one floating-point divide/square root unit, which executes division and square-root operations with six cycles at 300 MHz; and one vector general-purpose register file, which has 128 bits×32 words. The parallel execution of all units delivers a peak performance of 2.44 GFLOPS at 300 MHz 相似文献

19.

适用于FPGA的浮点型DSP硬核结构设计

下载免费PDF全文

赵赫黄志洪余乐杨海钢许仕龙郝亚男《太赫兹科学与电子信息学报》2019,17(3):524-530

提出一种浮点型数字信号处理器(DSP)硬核结构,在兼容定点数运算的同时,也为浮点数运算提供较好支持。目前各大现场可编程门阵列(FPGA)主流厂商在实现浮点数运算功能时均采用软核实现方式,即将浮点数运算算法映射到芯片上,通过逻辑资源和DSP模块实现。相比于传统方法,提出的硬核结构在不占用FPGA中其他逻辑资源情况下,仅利用DSP模块便能完成浮点数运算。设计中,充分考虑负载和时延影响,插入多级流水线,显著提高浮点数的计算效率。采用中芯国际(MCI)28 nm工艺设计并完成所提出的浮点型DSP硬核结构。仿真结果表明,所提出的硬核结构的单个浮点数加法和乘法效率为0.4 Gflops。相似文献

20.

基于CORDIC的浮点超越函数设计与实现

《信息技术》2017,(9):113-116

文中对扩展CORDIC算法进行研究,在此基础上,设计并实现了浮点超越函数硬件加速器。该加速器可在向量模式下完成反正切、反双曲正切和对数函数三种浮点超越函数的计算,设计采用IEEE-754标准单精度浮点数格式,计算流程划分为X、Y、Z三条数据通路,包含对阶、映射、迭代、补偿、规格化5步处理。实验结果表明,在Xilinx Virtex6器件上电路的最大工作频率可达221MHz,计算周期减少。相似文献