首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Field programmable gate arrays (FPGAs) are continuously gaining momentum and becoming essential part of today’s digital systems and applications. The growing use of these devices coupled with increasingly more complex and integrated designs necessitates search for techniques in efficient utilization of their internal resources. Standard HDL coding techniques and synthesis tools implement logic to look up table (LUT) based architecture. The resulting design utilizes more area on the chip and some fast and dedicated areas and resources of the chip remain unutilized. This in turn results in slower clock rates and larger critical path lengths, hence the design remains inefficient in terms of both speed and area. In this paper we present and discuss techniques to effectively utilize the FPGA dedicated resources in order to speed up achievable clock rates and reduce the FPGA area utilization. Various useful HDL constructs are presented that utilize dedicated hardware resources of modern Xilinx FPGAs. Optimization techniques are presented with implementation examples and corresponding quantitative performance evaluation. In most of the cases we have achieved 50% reduction in chip area utilization and simultaneously improved timing results significantly.  相似文献   

3.
针对高速视觉测量系统数据处理速度快、数据处理量大的特点,将FPGA技术与DSP技术相结合,研究了一种基于FPGA和多DSP的多通道并行处理的高速视觉测量系统。详细介绍了FPGA技术与多DSP技术在数字图像处理过程中的不同应用、高速视觉测量系统的总体结构以及各部分的工作原理。  相似文献   

4.
This paper presents an architecture for the extraction of visual primitives on chip: energy, orientation, disparity, and optical flow. This cost-optimized architecture processes in real time high-resolution images for real-life applications. In fact, we present a versatile architecture that may be customized for different performance requirements depending on the target application. In this case, dedicated hardware and its potential on-chip implementation on FPGA devices become an efficient solution. We have developed a multi-scale approach for the computation of the gradient-based primitives. Gradient-based methods are very popular in the literature because they provide a very competitive accuracy vs. efficiency trade-off. The hardware implementation of the system is performed using superscalar fine-grain pipelines to exploit the maximum degree of parallelism provided by the FPGA. The system reaches 350 and 270 VGA frames per second (fps) for the disparity and optical flow computations respectively in their mono-scale version and up to 32 fps for the multi-scale scheme extracting all the described features in parallel. In this work we also analyze the performance in accuracy and hardware resources of the proposed implementation.  相似文献   

5.
6.
In this paper, we describe a semi-automatic method for designing a programmable architecture related to high speed communication protocols. A case study of associative based architecture of high speed communication system is presented with a validation environment. The environment provides an interesting estimation using XILINX prototyping board including memories (content addressable memory, CAM, RAM, DPRAM). In our approach, we try to perform a rapid prototyping of such architecture and allow the designer to interact easily in order to customize the architecture according to application requirements. This method of validation provides important benefits in hardware prototyping: better validation environment and reduced time to give a real estimation for a large variety of applications.  相似文献   

7.
一种道路识别算法的硬件设计与实现   总被引:1,自引:1,他引:1  
车辆视觉导航算法的硬件实现具有实际意义,是目前的研究热点之一;为克服传统算法硬件设计实现比较复杂、调试困难、对设计人员要求较高等缺点,对基于高级语言的复杂算法硬件设计实现方法进行了研究;分析了基于Handel-C语言的道路识别算法FPGA硬件设计与实现过程,并进行了实验验证;实验结果表明,和目前采用的VHDL语言等设计方法相比,该方法具有设计灵活、开发周期短、资源利用合理等优点,同时易于软硬件协同设计.  相似文献   

8.
Nowadays, high performance System and Local Area Networks (SAN/LAN) have to serve heterogeneous traffic consisting of information flows with different bandwidth and latency requirements. This makes it necessary to provide Quality of Service (QoS) and optimize the design of network components.In this paper we present a hardware tool designed to analyze the performance of QoS networks, under given traffic conditions and server models. In particular, a reprogrammable multimedia traffic Generator/Monitor platform has been built. This permits prototyping the communication system of a high speed LAN/SAN on a single FPGA device. Hence, it can be used at design to produce more efficient devices. To illustrate the applicability of the platform we have used the Simple Multimedia Router (SMMR), an existing proposal to provide QoS.The modular structure of the tool and the fact that it has been implemented on an FPGA using a high level hardware programming language makes it flexible, scalable and easy to reconfigure. Besides, the architecture and implementation can be adapted to be used in more recent QoS NoC environments.  相似文献   

9.
Low-Density Parity-heck Codes (LDPC) with excellent error-correction capabilities have been widely used in both data communication and storage fields, to construct reliable cyber-physical systems that are resilient to real-world noises. Fast prototyping field-programmable gate array (FPGA)-based decoder is essential to achieve high decoding performance while accelerating the development process. This paper proposes a three-level parallel architecture, TLP-LDPC, to achieve high throughput by fully exploiting the characteristics of both LDPC and underlying hardware while effectively scaling to large-size FPGA platforms. The three-level parallel architecture contains a low-level decoding unit, a mid-level multi-unit decoding core, and a high-level multi-core decoder. The low-level decoding unit is a basic LDPC computation component that effectively combines the features of the LDPC algorithm and hardware with the specific structure (e.g., Look-Up-Table, LUT) of the FPGA and eliminates potential data conflicts. The mid-level decoding core integrates the input/output and multiple decoding units in a well-balancing pipelined fashion. The top-level multi-core architecture conveniently makes full use of board-level resources to improve the overall throughput. We develop an LDPC C++ code with dedicated pragmas and leverage HLS tools to implement the TLP-LDPC architecture. Experimental results show that TLP-LDPC achieves 9.63 Gbps end-to-end decoding throughput on a Xilinx Alveo U50 platform, 3.9x higher than existing HLS-based FPGA implementations.  相似文献   

10.
Biologically-inspired packet switched network on chip (NoC) based hardware spiking neural network (SNN) architectures have been proposed as an embedded computing platform for classification, estimation and control applications. Storage of large synaptic connectivity (SNN topology) information in SNNs require large distributed on-chip memory, which poses serious challenges for compact hardware implementation of such architectures. Based on the structured neural organisation observed in human brain, a modular neural networks (MNN) design strategy partitions complex application tasks into smaller subtasks executing on distinct neural network modules, and integrates intermediate outputs in higher level functions. This paper proposes a hardware modular neural tile (MNT) architecture that reduces the SNN topology memory requirement of NoC-based hardware SNNs by using a combination of fixed and configurable synaptic connections. The proposed MNT contains a 16:16 fully-connected feed-forward SNN structure and integrates in a mesh topology NoC communication infrastructure. The SNN topology memory requirement is 50 % of the monolithic NoC-based hardware SNN implementation. The paper also presents a lookup table based SNN topology memory allocation technique, which further increases the memory utilisation efficiency. Overall the area requirement of the architecture is reduced by an average of 66 % for practical SNN application topologies. The paper presents micro-architecture details of the proposed MNT and digital neuron circuit. The proposed architecture has been validated on a Xilinx Virtex-6 FPGA and synthesised using 65 nm low-power CMOS technology. The evolvable capability of the proposed MNT and its suitability for executing subtasks within a MNN execution architecture is demonstrated by successfully evolving benchmark SNN application tasks representing classification and non-linear control functions. The paper addresses hardware modular SNN design and implementation challenges and contributes to the development of a compact hardware modular SNN architecture suitable for embedded applications  相似文献   

11.
The design flow of a digital cryptographic device must take into account the evaluation of its security against attacks based on side channels observation. The adoption of high level countermeasures, as well as the verification of the feasibility of new attacks, presently require the execution of timeconsuming physical measurements on the prototype product or the simulation at a low abstraction level. Starting from these assumptions, we developed an exploration approach centered on high level simulation, in order to evaluate the actual implementation of a cryptographic algorithm, being it software or hardware based. The simulation is performed within a unified tool based on SystemC, that can model a software implementation running on a microprocessor-based architecture or a dedicated hardware implementation as well as mixed software-hardware implementations with cycle-accurate resolution. Here we describe the tool and provide a large set of design explorations and characterizations based on actual implementations of the AES cryptographic algorithm, demonstrating how the execution of a large set of experiments allowed by the fast simulation engine can lead to important improvements in the knowledge and the identification of the weaknesses in cryptographic algorithm implementations.  相似文献   

12.
13.
14.
基于FPGA的高速网络入侵检测系统   总被引:6,自引:1,他引:5  
处理速度成为制约基于软件的网络入侵检测系统性能的瓶颈。文中提出了用可重配置硬件(FPGA)和商用千兆以太网MAC实现的网络入侵检测系统体系结构。在该体系结构中,网络数据包的特征匹配以及复杂协议分析等高强度的计算均由可重配置硬件电路完成,而使主机CPU更专注于对复杂入侵方式的检测和对入侵行为的实时响应。分析表明,该体系结构能够快速适应入侵特征变化对硬件电路的重配置需求,使网络入侵检测系统可以以线速处理网络数据包。  相似文献   

15.
The Finite Element Method (FEM) is a computationally intensive scientific and engineering analysis tool that has diverse applications ranging from structural engineering to electromagnetic simulation. The trends in floating-point performance are moving in favor of Field-Programmable Gate Arrays (FPGAs), hence increasing interest has grown in the scientific community to exploit this technology. We present an architecture and implementation of an FPGA-based sparse matrix-vector multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations arising from FEM applications. FEM matrices display specific sparsity patterns that can be exploited to improve the efficiency of hardware designs. Our architecture exploits FEM matrix sparsity structure to achieve a balance between performance and hardware resource requirements by relying on external SDRAM for data storage while utilizing the FPGAs computational resources in a stream-through systolic approach. The architecture is based on a pipelined linear array of processing elements (PEs) coupled with a hardware-oriented matrix striping algorithm and a partitioning scheme which enables it to process arbitrarily big matrices without changing the number of PEs in the architecture. Therefore, this architecture is only limited by the amount of external RAM available to the FPGA. The implemented SMVM-pipeline prototype contains 8 PEs and is clocked at 110 MHz obtaining a peak performance of 1.76 GFLOPS. For 8 GB/s of memory bandwidth typical of recent FPGA systems, this architecture can achieve 1.5 GFLOPS sustained performance. Using multiple instances of the pipeline, linear scaling of the peak and sustained performance can be achieved. Our stream-through architecture provides the added advantage of enabling an iterative implementation of the SMVM computation required by iterative solution techniques such as the conjugate gradient method, avoiding initialization time due to data loading and setup inside the FPGA internal memory.  相似文献   

16.
数字图像处理算法评估系统的硬件设计   总被引:1,自引:1,他引:0  
为了能对不同的数字图像处理算法进行评估,采用了USB2.0总线技术传送数字图象数据到数字图像处理系统,在硬件设计上采用DSP+FPGA来完成图像处理任务。整个系统具有处理能力强,重现性好,能完成各种图像处理算法评估。  相似文献   

17.
A high performance digital architecture for the implementation of a non-linear image enhancement technique is proposed in this paper. The image enhancement is based on a luminance dependent non-linear enhancement algorithm which achieves simultaneous dynamic range compression, colour consistency and lightness rendition. The algorithm provides better colour fidelity, enhances less noise, prevents the unwanted luminance drop at the uniform luminance areas, keeps the ‘bright’ background unaffected, and enhances the ‘dark’ objects in ‘bright’ background. The algorithm contains a large number of complex computations and thus it requires specialized hardware implementation for real-time applications. Systolic, pipelined and parallel design techniques are utilized effectively in the proposed FPGA-based architectural design to achieve real-time performance. Estimation techniques are also utilized in the hardware algorithmic design to achieve faster, simpler and more efficient architecture. The video enhancement system is implemented using Xilinx’s multimedia development board that contains a VirtexII-X2000 FPGA and it is capable of processing approximately 67 Mega-pixels (Mpixels) per second.  相似文献   

18.
19.
随着处理器架构的发展,高性能异构多核处理器不断涌现.由于高性能异构多核处理器的设计十分复杂,为了降低设计风险,缩短验证周期,提前进行软件开发,复现硅后问题等,通常需要搭建现场可编程门阵列(field programmable gate array, FPGA)的原型验证平台,并基于FPGA平台开展种类繁多,功能各异的软硬协同验证和调试工作.提出的基于同构FPGA平台对异构多核高性能处理器的FPGA调试、验证方法,有效地利用了异构多核处理器的架构特征,同构FPGA的对称特点,以层次化的方法自顶向下划分FPGA,自底向上构建FPGA平台.结合差速桥、自适应延迟调节、内嵌的虚拟逻辑分析仪(virtual logic analyzer, VLA)等技术可快速完成FPGA平台的点亮(bring-up)和部署.所提出的多核互补,核间替换模拟的调试SHELL等方法可以快速完整地对目标高性能异构多核处理器进行FPGA验证.通过该FPGA原型验证平台,成功地完成了硅前验证,软硬件协同开发和测试,硅后问题复现工作,并为下一代处理器架构设计提供了快速的硬件平台.  相似文献   

20.
数字图像处理(Digital Image Processing)广泛应用于航空航天、生物医学工程、通信工程、工业和工程、军事公安、文化艺术等方面.由于一些应用的实时性和环境要求,通常采用数字信号处理器(Digital Signal Processing,简称DSP)处理图像.采用超长指令字(Very Long Instruction Word,简称VLIW)体系结构的DSP由于功耗低、硬件结构简单和并行性好等优点,在实时图像处理应用中使用广泛.根据图像处理算法特点和VLIW DSP体系结构特点提出在YLIW DSP上优化图像处理算法的一般方法,包括存储优化方法和指令级并行优化方法.最后采用提出的方法对多个常用的图像处理算法优化,试验结果表明有较好优化效果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号