共查询到20条相似文献,搜索用时 0 毫秒
1.
研究并完成了基于FPGA的浮点乘法器的硬件实现,详细阐述了其原理,重点介绍了乘法器的结构并通过了数据验证。在MaxplusⅡ上完成了综合仿真测试。 相似文献
2.
《电子学报:英文版》2016,(6):1063-1070
Fast Fourier transform (FFT) accelerator and Coordinate rotation digital computer (CORDIC) algorithm play important roles in signal processing.We propose a conflgurable floating-point FFT accelerator based on CORDIC rotation,in which twiddle direction prediction is presented to reduce hardware cost and twiddle angles are generated in real time to save memory.To finish CORDIC rotation efficiently,a novel approach in which segmentedparallel iteration and compress iteration based on CSA are presented and redundant CORDIC is used to reduce the latency of each iteration.To prove the efficiency of our FFT accelerator,four FFT accelerators are prototyped into a FPGA chip to perform a batch-FFT.Experimental results show that our structure,which is composed of four butterfly units and finishes FFT with the size ranging from 64 to 8192 points,occupies 33230(3%) REGs and 143006(30%)LUTs.The clock frequency can reach 122MHz.The resources of double-precision FFT is only about 2.5 times of single-precision while the theoretical value is 4.What's more,only 13331 cycles are required to implement 8192-points double-precision FFT with four butterfly units in parallel. 相似文献
3.
浮点加法器是集成电路数据通道中重要的单元,它的性能和功耗极大地影响着处理器和数字信号处理器的性能。文章分析了浮点加法器的几种结构,重点介绍了实现低功耗的三数据通道结构。最后,还对浮点加法器结构的实用性进行了分析。 相似文献
4.
5.
6.
7.
提出一种基于蒙特卡罗技术的FPGA结构研究新方法。该方法在布线资源中随机产生均匀分布的开路故障,并绕开障碍物布线互连,不依赖于CAD算法和基准电路。开关块拓扑分析实例表明该方法与CAD方法的结论一致,而评估时间从15小时缩短到15分钟。 相似文献
8.
针对当前FPGA结构设计方法灵活度低、容易出错、自动化程度不够高的现状,提出一种FPGA结构设计方法.根据这种方法实现了EDA工具VA.VA使用GUI编辑结构描述文件,具有使用结构描述文件自动生成FPGA详细结构的功能,并通过在GUI中局部调整FPGA结构来实现设计异质型布线结构的功能.VA将FPGA结构设计和结构评估功能集成在一起,提供了全自动的评估流程.借助VA,成功设计出一款自主研发的FPGA芯片VS1000,设计过程和结果证明了VA的高效性和正确性. 相似文献
9.
Reddy B. Naresh Kumar Seetharamulu B. Krishna G. Siva Vani B. Veena 《Wireless Personal Communications》2022,125(4):3379-3391
Wireless Personal Communications - The optimization of VLSI design is playing an important role in the development of technological applications. The optimization of VLSI technology helps to... 相似文献
10.
Ardavan Pedram John D. McCalpin Andreas Gerstlauer 《Journal of Signal Processing Systems》2014,77(1-2):169-190
FFT algorithms have memory access patterns that prevent many architectures from achieving high computational utilization, particularly when parallel processing is required to achieve the desired levels of performance. Starting with a highly efficient hybrid linear algebra/FFT core, we co-design the on-chip memory hierarchy, on-chip interconnect, and FFT algorithms for a multicore FFT processor. We show that it is possible to to achieve excellent parallel scaling while maintaining power and area efficiency comparable to that of the single-core solution. The result is an architecture that can effectively use up to 16 hybrid cores for transform sizes that can be contained in on-chip SRAM. When configured with 12MiB of on-chip SRAM, our technology evaluation shows that the proposed 16-core FFT accelerator should sustain 388 GFLOPS of nominal double-precision performance, with power and area efficiencies of 30 GFLOPS/W and 2.66 GFLOPS/mm2, respectively. 相似文献
11.
Gayasen A. Narayanan V. Kandemir M. Rahman A. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(7):882-893
Three-dimensional (3-D) integration is an attractive technology to reduce wirelengths in a field-programmable gate array (FPGA). However, it suffers from two problems: one, the inter-layer vias are limited in number, and second, the increased power density leads to high junction temperatures. In this paper, we tackle the first problem by designing switch boxes that maximize the use of the vias. Compared to the previously used subset switch box, our best switch box reduces the number of vias by about 49% and area-delay product by about 9%. For the second problem, we utilize the difference in power densities between CLBs and some of the hard blocks in modern FPGAs to distribute the power more uniformly across the FPGA. The peak temperature in a two-layer FPGA reduces by about 16degC after our change. 相似文献
12.
文章提出了一种基于FPGA的kalman滤波器的硬件实现结构。由于设计应用了高性能的算数处理单元和流水线结构,获得高效的结果,每次的处理周期大约是57.98ns,比其软件实现快了3到4个数量级。同时有利用了资源共享技术,使其面积比没有使用资源共享的情况下减少了45%左右。 相似文献
13.
一种用FPGA实现的FIR滤波器结构 总被引:1,自引:0,他引:1
A digital
FIR filter architecture implemented in FPGA is described.The FIR architecture is based
on a pipelined multiply-add-accumulator(MAC)which employs carry-save array.To save
the delay time and hardware resources,multiplier uses the partial products generated by
modified Booth algorithm.The FIR architecture is written in VHDL,and is synthesized
into FPGA.The synthesis result shows that the proposed FIR architecture can run at
50 MHz clock rate in FPGA XC4025e-2. 相似文献
14.
《IEEE transactions on circuits and systems. I, Regular papers》2009,56(11):2425-2438
15.
High-throughput Block Turbo Decoding: From Full-parallel Architecture to FPGA Prototyping 总被引:1,自引:0,他引:1
Camille Leroux Christophe Jégo Patrick Adde Michel Jézéquel 《Journal of Signal Processing Systems》2009,57(3):349-361
Ultra high-speed block turbo decoder architectures meet the demand for even higher data rates and open up new opportunities
for the next generations of communication systems such as fiber optic transmissions. This paper presents the implementation,
onto an FPGA device of an ultra high throughput block turbo code decoder. An innovative architecture of a block turbo decoder
which enables the memory blocks between all half-iterations to be removed is presented. A complexity analysis of the elementary
decoder leads to a low complexity decoder architecture for a negligible performance degradation. The resulting turbo decoder
is implemented on a Xilinx Virtex II-Pro FPGA in a communication experimental setup which also includes an innovative parallel
product encoder. The implemented block turbo decoder processes input data at 600 Mb/s. The component code is an extended Bose,
Ray-Chaudhuri, Hocquenghem (eBCH(16,11)) code. Some solutions to reach even higher data rates are finally presented. 相似文献
16.
Maya B. Gokhale Janice M. Stone Edson Gomersall 《The Journal of VLSI Signal Processing》2000,24(2-3):165-180
Hybrid architectures combining conventional processors with configurable logic resources enable efficient coordination of control with datapath computation. With integration of the two components on a single device, housekeeping tasks and, optionally, loop control and data-dependent branching, can be handled by the conventional processor, while regular datapath computation occurs on the configurable hardware. This paper describes a novel approach to programming such hybrid devices that gives the programmer control over mapping of data and computation between conventional processor and configurable logic. With a simple set of pragma and intrinsic function directives, the NAPA C language provides for manual control over perhaps the most important aspect of programming such hybrid devices. Alternatively, as experience is gained about tradeoffs between the two computational resources, mapping directives may eventually be generated by an external tool. The paper further describes a research prototype compiler that targets the hybrid processor model, with a concrete implementation for the National Semiconductor NAPA1000 chip. The NAPA C compiler parses the mapping directives, performs semantic analysis, and co-synthesizes a conventional processor executable combined with a configuration bit stream for the configurable logic. Two major compiler phases, the synthesis of pipelined loops and the datapath synthesis, are described in detail. 相似文献
17.
基于FPGA和LMS算法的系统建模 总被引:1,自引:1,他引:1
自适应滤波器用于实现对未知系统的建模,用Matlab中的Simulink对LMS算法的实现方法进行仿真,在FP—GA中实现了LMS算法及其建模,并对FPGA设计的系统建模结果采用Matlab软件仿真,以增强Quartus的仿真功能,从而得到完整且直观的仿真结果。这种系统建模所采用的仿真、实现和验证方法同样适用于消除宽带信号中的窄带干扰,实现自适应谱线增强以及自适应均衡等,具有一定通用性。 相似文献
18.
19.
We describe a methodology to design and optimize Three-dimensional (3D) Tree-based FPGA by introducing a break-point at particular tree level interconnect to optimize the speed, area, and power consumption. The ability of the design flow to decide a horizontal or vertical network break-point based on design specifications is a defining feature of our design methodology. The vertical partitioning is organized in such a way to balance the placement of logic blocks and switch blocks into multiple tiers while the horizontal partitioning optimizes the interconnect delay by segregating the logic blocks and programmable interconnect resources into multiple tiers to build a 3D stacked Tree-based FPGA. We finally evaluate the effect of Look-Up-Table (LUT) size, cluster size, speed, area and power consumption of the proposed 3D Tree-based FPGA using our home grown experimental flow and show that the horizontal partitioned 3D stacked Tree-based FPGA with LUT and cluster sizes equal to 4 has the best area-delay product to design and manufacture 3D Tree-based FPGA. 相似文献
20.
EBCOT双上下文窗口并行编码及FPGA实现 总被引:1,自引:0,他引:1
JPEG2000编码系统中,EBCOT的编码速度已经成为整个系统编码效率的瓶颈。通过研究EBCOT编码原理和通道并行算法的编码过程,提出了双上下文窗口位并行的EBCOT系数位建模方法。详细说明了使用该算法的系数位建模系统的硬件结构。系数位编码系统有效减少了编码时钟周期数,并在FPGA上进行了功能验证。 相似文献