首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
面向数据驱动处理器阵列的自动综合   总被引:1,自引:0,他引:1  
本文提出了一种数据驱动处理器阵列结构,该结构能有效平衡存储和计算,适合用于在FPGA上实现高性能的算法加速,同时提出了一个面向该结构的自动综合框架,通过该框架可以将常规循环有效地映射到数据驱动处理器阵列上。实验结果表明了该自动综合框架的有效性,且生成的设计性能优于通用处理器。  相似文献   

2.
流水线配置技术在可重构处理器中的应用   总被引:1,自引:1,他引:0       下载免费PDF全文
提出一种应用于可重构处理器中的流水线配置技术,能够有效减低配置时间,提高应用程序的执行速度。可重构处理器包括通用处理器和一个粗颗粒度的可重构阵列。可重构阵列将处理应用中占据大量执行时间的循环,这些循环将被分解为不同的行在阵列上以流水线的方式执行。该技术在FPGA验证系统上得到了验证。验证的应用包括H.264基准中的整数离散余弦变换和运动估计。相比传统的可重构处理器PipeRench, MorphoSys以及TI的DSP TMS320DM642有大约3.5倍的性能提升。  相似文献   

3.
4.
This paper deals with the design of processor arrays for regular algorithms. The design is constrained by limited implementation cost characterizing a reconfigurable architecture. The objective of the design is to minimize the latency of the processor array. The presented approach to determine a scheduling function leading to the minimal latency of the processor array is formulated as a linear program that incorporates 1) the selection of modules to be implemented in processors to execute operations of the algorithm, 2) the binding of operations to modules, 3) the computation of the number of registers, 4) the limitation of implementation cost for modules and registers, 5) the determination of the size of partitions that allows to match the limited implementation cost.  相似文献   

5.
This paper describes the development of efficient hardware/software (HW/SW) neuro-fuzzy systems. The model used in this work consists of an adaptive neuro-fuzzy inference system modified for efficient HW/SW implementation. The design of two different on-chip approaches are presented: a high-performance parallel architecture for offline training and a pipelined architecture suitable for online parameter adaptation. Details of important aspects concerning the design of HW/SW solutions are given. The proposed architectures have been implemented using a system-on-a-programmable-chip. The device contains an embedded-processor core and a large field programmable gate array (FPGA). The processor provides flexibility and high precision to implement the learning algorithms, while the FPGA allows the development of high-speed inference architectures for real-time embedded applications.  相似文献   

6.
This paper presents a field programmable gate array (FPGA) implementation of a three-layer perceptron using the few DSP blocks and few block RAMs (FDFM) approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores with few DSP slices and few block RAMs are used. We have implemented 150 processor cores for perceptrons in a Xilinx Virtex-6 family FPGA XC6VLX240T-FF1156. The implementation results show that the 150 processor cores for 32-32-32 input–hidden–output layer perceptrons can be implemented in the FPGA using 150 DSP48 slices, 185 block RAMs and 9676 slices. It runs in 242.89 MHz clock frequency, and a single evaluation of 150 nodes perceptron can be performed 1.65 × 107 times per second.  相似文献   

7.
空间科学实验中图像的分辨率不断提高、数据量越来越大,因此需要对图像数据进行星上压缩处理后再进行传输。FPGA具有低功耗、高性能的特点,已普遍应用在卫星的各种有效载荷上,因此可采用FPGA实现图像压缩。基于FPGA的图像压缩算法的核心是DCT变换,而DCT变换中需消耗大量的乘法资源。为了提高图像压缩的效率,同时减少对专用乘法器的依赖,本文就充分利用FPGA中的BRAM与LUT资源,使用改进型的分布式算法、流水结构和乒乓操作,在避免使用乘法器的同时,实现JPEG压缩算法中的DCT变换,具有良好的可移植性。经验证,该方法用于基于FPGA的JPEG图像压缩系统中,相比传统DCT快速算法运算速度显著提高。  相似文献   

8.
介绍了一种系统时钟信号同步设计。为了提高系统时钟同步技术以及系统的可靠性,以现场可编程阵列(FPGA)代替传统的处理器为控制核心,采用锁相环(PLL)和Verilog硬件描述语言进行设计,达到复位实现时钟同步目的。实践证明,该设计运行稳定,可靠性强,适合在高速工作时钟下工作。  相似文献   

9.
Modern field programmable gate array (FPGA) chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high-density FPGAs, it is now possible to implement a high-performance VLIW (very long instruction word) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient ILP (instruction-level parallelism) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors to shorten the development cycle, and to use the powerful FPGA resources to increase real-time performance. We present a flexible VLIW VHDL processor model with a variable instruction set and a customizable architecture which allows exploiting intrinsic parallelism of a target application using advanced compiler technology and implementing it in an optimal manner on FPGA. Some common algorithms of image processing were tested and validated using the proposed development cycle. We also realized the rapid prototyping of embedded contactless palmprint extraction on an FPGA Virtex-6 based board for a biometric application and obtained a processing time of 145.6 ms per image. Our approach applies some criteria for co-design tools: flexibility, modularity, performance, and reusability.  相似文献   

10.
介绍了一种在PC机上实现的高速16位并行数据采集接口。该接口由高速光电隔离电路,双端口FIFO存储缓冲器电路及由FPGA芯片构成的计算机接口逻辑与控制电路等组成。该接口电路将终端显示处理系统与前端数据处理系统通过光电耦合器隔离开来,避免了它们之间的相互干扰,较好地解决了16位并行数据高速传输中存在的电磁干扰问题和大数据量实时有效传输问题。采用现场可编码门阵列FPGA芯片,使硬件设计软件化,既实现了复杂逻辑功能设计,又减少了硬件电路规模,提高了系统的可靠性,在雷达、声纳等复杂系统中具有良好的应用价值。  相似文献   

11.
Language processor generators are systems that produce various language processors (including compilers) on the basis of a high-level specification. The design of language processor generators is discussed on the basis of experiments with a traditional compiler writing system (HLP78) employing pore LALR parsing and general attribute grammars. It is argued that these methods are too primitive from the practical point of view. The design of a new language processor generator, HLP84, is based on this view. This system is an attempt to provide high-level tools for a restricted class of applications (one-pass analysis). The syntactic facilities include regular expressions on the right-hand sides of productions, a disambiguating mechanism that is integrated with regular expressions, and a mechanism for using semantic information to aid parsing. The semantic facilities include automatic support for semantic error handling and for symbol tables. Early experiences with the new system show that in spite of the general overhead caused by the higher automation level, the system allows the generation of reasonably efficient processors.  相似文献   

12.
The continued growth of both wired and wireless communications has triggered the revolution for high speed security implementations. RIPEMD hash functions are widely used, in many applications of cryptography. A reconfigurable processor architecture and the VLSI implementation of these functions are proposed in this work. The introduced processor is reconfigurable in the sense that performs alternatively all RIPEMD hash functions. In order to indicate the advantages of the proposed design, each one of these hash functions has also been implemented in a separate hardware device (FPGA). The proposed processor FPGA implementation achieves high speed hashing up to 2 Gbps. Comparing with previous published hardware designs, the proposed processor has higher performance in the range from 22 to 30 times. It also performs much better than the assembly language implementations of the RIPEMD-128 and RIPEMD-160. The proposed processor could be used for the implementation of data integrity units, and in many other sensitive cryptographic applications, such as, digital signatures, message authentication codes and random number generators.  相似文献   

13.
A general-purpose processor cell, called DOP, is presented. The DOP architecture is designed to support efficiently high-level programming languages (HLLs) such as C or Pascal, but still be simple enough to be implemented on one field programmable gate array (FPGA). Special attention is paid to the analysis of HLL requirements on processors. The DOP is designed to be used as a building block (cell) in a FPGA library. Its simplicity allows other microcomputer functional units to be implemented on the same FPGA. The DOP serves as a core for simple solutions using currently available technology.  相似文献   

14.
This paper proposes a high speed multi-level-parallel array processor for programmable vision chips.This processor includes 2-D pixel-parallel processing element(PE)array and 1-D row-parallel row processor(RP)array.The two arrays both operate in a single-instruction multiple-data(SIMD)fashion and share a common instruction decoder.The sizes of the arrays are scalable according to dedicated applications.In PE array,each PE can communicate not only with its nearest neighbor PEs,but also with the next near neighbor PEs in diagonal directions.This connection can help to speed up local operations in low-level image processing.On the other hand,global operations in mid-level processing are accelerated by the skipping chain and binary boosters in RP array.The array processor was implemented on an FPGA device,and was successfully tested for various algorithms,including real-time face detection based on PPED algorithm.The results show that the image processing speed of proposed processor is much higher than that of the state-of-the-arts digital vision chips.  相似文献   

15.
为了实现自引导车AGV(automatic guided vehicle)的自主导航,设计一种基于全方位视觉的嵌入式跟踪器.该跟踪器将鱼眼镜头与现场可编程门阵列(FPGA)、数字信号处理器(DSP)以及CMOS感光芯片相集成,实现一种小型化、结构化的设计.在此跟踪器上移植改进的均值漂移和粒子滤波相结合的跟踪算法,实现基于鱼眼图像的动态多目标跟踪,并通过与传统算法的分析、比较来体现改进算法的优越性.实验结果表明本文所研究的跟踪器具有良好的跟踪效果,满足实时性、准确性和鲁棒性的要求.  相似文献   

16.
为实现结构光激光线条纹中心的实时提取,将方向模板算法进行了适应硬件的改进,且提出并实现了一种专用硬件结构。基于流水线技术和并行技术的硬件设计保证了该算法的实时实现。利用现场可编程门阵列器件FPGA实现了结构光激光线图像条纹中心线的实时提取。试验表明采用FPGA实现视频处理的专用算法具有成本低、实时性好、研发周期短的优点。  相似文献   

17.
嵌入式系统软硬件协同设计中的快速样机平台   总被引:5,自引:2,他引:5  
提出一种嵌入式系统软硬件协同设计的快速样机平台设计方案,该方案使用系统级可编程芯片和处理机软核技术来构成快速样机平台所需的FPGA阵列和规模可调的处理机,以此实现软硬件的更紧密灵活的耦合和更小的通信延迟.可重构逻辑的应用使得该快速样机平台具有简单规整的结构,一方面使得快速样机平台之间的扩展连接更为容易,另一方面使得FPGA芯片中的逻辑资源能得到更充分利用.  相似文献   

18.
The design of specialized processing array architectures, capable of executing any given arbitrary algorithm, is proposed. An approach is adopted in which the algorithm is first represented in the form of a dataflow graph and then mapped onto the specialized processor array. The processors in this array execute the operations included in the corresponding nodes (or subsets of nodes) of the dataflow graph, while regular interconnections of these elements serve as edges of the graph. To speed up the execution, the proposed array allows the generation of computation fronts and their cancellation at a later time, depending on the arriving data operands; thus it is called a data-driven array. The structure of the basic cell and its programming are examined. Some design details are presented for two selected blocks, the instruction memory and the flag array. A scheme for mapping a dataflow graph (program) onto a hexagonally connected array is described and analyzed. Two distinct performance measures-mapping efficiency and array utilization-and some performance results are discussed  相似文献   

19.
20.
苏睿  刘贵忠  张彤宇 《计算机学报》2006,29(10):1772-1779
基于改进的线性处理器阵列,提出了一种用于全搜索运动估计的阵列处理器结构,它可以并行执行运算而只要求串行的数据输入.分析表明这种结构不仅执行效率高,而且内部缓冲区很小.由于其简单的结构和规则的数据流,它可以方便地在FPGA器件中实现,用作实时编码器的协处理器.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号