期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

邬贵明窦勇王淼《计算机工程与科学》2009,31(Z1)

本文提出了一种数据驱动处理器阵列结构,该结构能有效平衡存储和计算,适合用于在FPGA上实现高性能的算法加速,同时提出了一个面向该结构的自动综合框架,通过该框架可以将常规循环有效地映射到数据驱动处理器阵列上。实验结果表明了该自动综合框架的有效性,且生成的设计性能优于通用处理器。相似文献

2.

流水线配置技术在可重构处理器中的应用 总被引：1，自引：1，他引：0

下载免费PDF全文

于苏东刘雷波魏少军《计算机工程》2010,36(8):227-229

提出一种应用于可重构处理器中的流水线配置技术,能够有效减低配置时间,提高应用程序的执行速度。可重构处理器包括通用处理器和一个粗颗粒度的可重构阵列。可重构阵列将处理应用中占据大量执行时间的循环,这些循环将被分解为不同的行在阵列上以流水线的方式执行。该技术在FPGA验证系统上得到了验证。验证的应用包括H.264基准中的整数离散余弦变换和运动估计。相比传统的可重构处理器PipeRench, MorphoSys以及TI的DSP TMS320DM642有大约3.5倍的性能提升。相似文献

3.

Applying frame layout to hardware design in FPGA for seamless support of cross calls in CPU-FPGA coupling architecture

Giang Nguyen Thi Huong Author VitaeYeoul Na Author Vitae Seon Wook Kim^{Author Vitae} 《Microprocessors and Microsystems》2011,35(5):462-472

相似文献

4.

Design of Processor Arrays for Reconfigurable Architectures

Fimmel Dirk Merker Renate 《The Journal of supercomputing》2001,19(1):41-56

This paper deals with the design of processor arrays for regular algorithms. The design is constrained by limited implementation cost characterizing a reconfigurable architecture. The objective of the design is to minimize the latency of the processor array. The presented approach to determine a scheduling function leading to the minimal latency of the processor array is formulated as a linear program that incorporates 1) the selection of modules to be implemented in processors to execute operations of the algorithm, 2) the binding of operations to modules, 3) the computation of the number of registers, 4) the limitation of implementation cost for modules and registers, 5) the determination of the size of partitions that allows to match the limited implementation cost. 相似文献

5.

Efficient Hardware/Software Implementation of an Adaptive Neuro-Fuzzy System 总被引：1，自引：0，他引：1

del Campo I. Echanobe J. Bosque G. Tarela J.M. 《Fuzzy Systems, IEEE Transactions on》2008,16(3):761-778

This paper describes the development of efficient hardware/software (HW/SW) neuro-fuzzy systems. The model used in this work consists of an adaptive neuro-fuzzy inference system modified for efficient HW/SW implementation. The design of two different on-chip approaches are presented: a high-performance parallel architecture for offline training and a pipelined architecture suitable for online parameter adaptation. Details of important aspects concerning the design of HW/SW solutions are given. The proposed architectures have been implemented using a system-on-a-programmable-chip. The device contains an embedded-processor core and a large field programmable gate array (FPGA). The processor provides flexibility and high precision to implement the learning algorithms, while the FPGA allows the development of high-speed inference architectures for real-time embedded applications. 相似文献

6.

An FPGA implementation for neural networks with the FDFM processor core approach

《International Journal of Parallel, Emergent and Distributed Systems》2013,28(4):308-320

This paper presents a field programmable gate array (FPGA) implementation of a three-layer perceptron using the few DSP blocks and few block RAMs (FDFM) approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores with few DSP slices and few block RAMs are used. We have implemented 150 processor cores for perceptrons in a Xilinx Virtex-6 family FPGA XC6VLX240T-FF1156. The implementation results show that the 150 processor cores for 32-32-32 input–hidden–output layer perceptrons can be implemented in the FPGA using 150 DSP48 slices, 185 block RAMs and 9676 slices. It runs in 242.89 MHz clock frequency, and a single evaluation of 150 nodes perceptron can be performed 1.65 × 10⁷ times per second. 相似文献

7.

基于FPGA的高速二维DCT变换的研究与实现

刘庆陈金强于沛玲《计算机工程与科学》2012,34(3):103-107

空间科学实验中图像的分辨率不断提高、数据量越来越大,因此需要对图像数据进行星上压缩处理后再进行传输。FPGA具有低功耗、高性能的特点,已普遍应用在卫星的各种有效载荷上,因此可采用FPGA实现图像压缩。基于FPGA的图像压缩算法的核心是DCT变换,而DCT变换中需消耗大量的乘法资源。为了提高图像压缩的效率,同时减少对专用乘法器的依赖,本文就充分利用FPGA中的BRAM与LUT资源,使用改进型的分布式算法、流水结构和乒乓操作,在避免使用乘法器的同时,实现JPEG压缩算法中的DCT变换,具有良好的可移植性。经验证,该方法用于基于FPGA的JPEG图像压缩系统中,相比传统DCT快速算法运算速度显著提高。相似文献

8.

基于时钟同步技术在数据采集系统中的应用

范业明刘增武《计算机与数字工程》2011,39(2):98-101

介绍了一种系统时钟信号同步设计。为了提高系统时钟同步技术以及系统的可靠性,以现场可编程阵列（FPGA）代替传统的处理器为控制核心,采用锁相环（PLL）和Verilog硬件描述语言进行设计,达到复位实现时钟同步目的。实践证明,该设计运行稳定,可靠性强,适合在高速工作时钟下工作。相似文献

9.

Flexible VLIW processor based on FPGA for efficient embedded real-time image processing

Vincent Brost Fan Yang Charles Meunier 《Journal of Real-Time Image Processing》2014,9(1):47-59

Modern field programmable gate array (FPGA) chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high-density FPGAs, it is now possible to implement a high-performance VLIW (very long instruction word) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient ILP (instruction-level parallelism) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors to shorten the development cycle, and to use the powerful FPGA resources to increase real-time performance. We present a flexible VLIW VHDL processor model with a variable instruction set and a customizable architecture which allows exploiting intrinsic parallelism of a target application using advanced compiler technology and implementing it in an optimal manner on FPGA. Some common algorithms of image processing were tested and validated using the proposed development cycle. We also realized the rapid prototyping of embedded contactless palmprint extraction on an FPGA Virtex-6 based board for a biometric application and obtained a processing time of 145.6 ms per image. Our approach applies some criteria for co-design tools: flexibility, modularity, performance, and reusability. 相似文献

10.

一种基于PC机的高速16位并行数据采集接口

邱宏安《数据采集与处理》2000,15(4):516-519

介绍了一种在PC机上实现的高速16位并行数据采集接口。该接口由高速光电隔离电路,双端口FIFO存储缓冲器电路及由FPGA芯片构成的计算机接口逻辑与控制电路等组成。该接口电路将终端显示处理系统与前端数据处理系统通过光电耦合器隔离开来,避免了它们之间的相互干扰,较好地解决了16位并行数据高速传输中存在的电磁干扰问题和大数据量实时有效传输问题。采用现场可编码门阵列FPGA芯片,使硬件设计软件化,既实现了复杂逻辑功能设计,又减少了硬件电路规模,提高了系统的可靠性,在雷达、声纳等复杂系统中具有良好的应用价值。相似文献

11.

The design of a language processor generator

Kai Koskimies Otto Nurmi Jukka Paakki Seppo Sippu 《Software》1988,18(2):107-135

Language processor generators are systems that produce various language processors (including compilers) on the basis of a high-level specification. The design of language processor generators is discussed on the basis of experiments with a traditional compiler writing system (HLP78) employing pore LALR parsing and general attribute grammars. It is argued that these methods are too primitive from the practical point of view. The design of a new language processor generator, HLP84, is based on this view. This system is an attempt to provide high-level tools for a restricted class of applications (one-pass analysis). The syntactic facilities include regular expressions on the right-hand sides of productions, a disambiguating mechanism that is integrated with regular expressions, and a mechanism for using semantic information to aid parsing. The semantic facilities include automatic support for semantic error handling and for symbol tables. Early experiences with the new system show that in spite of the general overhead caused by the higher automation level, the system allows the generation of reasonably efficient processors. 相似文献

12.

On the hardware implementation of RIPEMD processor: Networking high speed hashing, up to 2 Gbps

N. Sklavos^{Author Vitae} O. Koufopavlou Author Vitae 《Computers & Electrical Engineering》2005,31(6):361-379

The continued growth of both wired and wireless communications has triggered the revolution for high speed security implementations. RIPEMD hash functions are widely used, in many applications of cryptography. A reconfigurable processor architecture and the VLSI implementation of these functions are proposed in this work. The introduced processor is reconfigurable in the sense that performs alternatively all RIPEMD hash functions. In order to indicate the advantages of the proposed design, each one of these hash functions has also been implemented in a separate hardware device (FPGA). The proposed processor FPGA implementation achieves high speed hashing up to 2 Gbps. Comparing with previous published hardware designs, the proposed processor has higher performance in the range from 22 to 30 times. It also performs much better than the assembly language implementations of the RIPEMD-128 and RIPEMD-160. The proposed processor could be used for the implementation of data integrity units, and in many other sensitive cryptographic applications, such as, digital signatures, message authentication codes and random number generators. 相似文献

13.

DOP—a simple processor for custom computing machines

《Journal of Microcomputer Applications》1994,17(3):239-253

A general-purpose processor cell, called DOP, is presented. The DOP architecture is designed to support efficiently high-level programming languages (HLLs) such as C or Pascal, but still be simple enough to be implemented on one field programmable gate array (FPGA). Special attention is paid to the analysis of HLL requirements on processors. The DOP is designed to be used as a building block (cell) in a FPGA library. Its simplicity allows other microcomputer functional units to be implemented on the same FPGA. The DOP serves as a core for simple solutions using currently available technology. 相似文献

14.

A high speed multi-level-parallel array processor for vision chips

SHI Cong YANG Jie WU NanJian WANG ZhiHua 《中国科学:信息科学(英文版)》2014,(6):207-218

This paper proposes a high speed multi-level-parallel array processor for programmable vision chips.This processor includes 2-D pixel-parallel processing element(PE)array and 1-D row-parallel row processor(RP)array.The two arrays both operate in a single-instruction multiple-data(SIMD)fashion and share a common instruction decoder.The sizes of the arrays are scalable according to dedicated applications.In PE array,each PE can communicate not only with its nearest neighbor PEs,but also with the next near neighbor PEs in diagonal directions.This connection can help to speed up local operations in low-level image processing.On the other hand,global operations in mid-level processing are accelerated by the skipping chain and binary boosters in RP array.The array processor was implemented on an FPGA device,and was successfully tested for various algorithms,including real-time face detection based on PPED algorithm.The results show that the image processing speed of proposed processor is much higher than that of the state-of-the-arts digital vision chips. 相似文献

15.

嵌入式全方位视觉跟踪器

李涛曹作良朱均超《自动化与仪表》2011,26(1)

为了实现自引导车AGV(automatic guided vehicle)的自主导航,设计一种基于全方位视觉的嵌入式跟踪器.该跟踪器将鱼眼镜头与现场可编程门阵列(FPGA)、数字信号处理器(DSP)以及CMOS感光芯片相集成,实现一种小型化、结构化的设计.在此跟踪器上移植改进的均值漂移和粒子滤波相结合的跟踪算法,实现基于鱼眼图像的动态多目标跟踪,并通过与传统算法的分析、比较来体现改进算法的优越性.实验结果表明本文所研究的跟踪器具有良好的跟踪效果,满足实时性、准确性和鲁棒性的要求. 相似文献

16.

基于FPGA的激光条纹中心实时检测

钱铮铁李德华《计算机工程与应用》2004,40(27):49-52

为实现结构光激光线条纹中心的实时提取,将方向模板算法进行了适应硬件的改进,且提出并实现了一种专用硬件结构。基于流水线技术和并行技术的硬件设计保证了该算法的实时实现。利用现场可编程门阵列器件FPGA实现了结构光激光线图像条纹中心线的实时提取。试验表明采用FPGA实现视频处理的专用算法具有成本低、实时性好、研发周期短的优点。相似文献

17.

嵌入式系统软硬件协同设计中的快速样机平台 总被引：5，自引：2，他引：5

吴百锋彭澄廉孙晓光《计算机辅助设计与图形学学报》2003,15(7):778-782

提出一种嵌入式系统软硬件协同设计的快速样机平台设计方案，该方案使用系统级可编程芯片和处理机软核技术来构成快速样机平台所需的FPGA阵列和规模可调的处理机，以此实现软硬件的更紧密灵活的耦合和更小的通信延迟．可重构逻辑的应用使得该快速样机平台具有简单规整的结构，一方面使得快速样机平台之间的扩展连接更为容易，另一方面使得FPGA芯片中的逻辑资源能得到更充分利用．相似文献

18.

A data-driven VLSI array for arbitrary algorithms

Koren I. Mendelson B. Peled I. Silberman G.M. 《Computer》1988,21(10):30-43

The design of specialized processing array architectures, capable of executing any given arbitrary algorithm, is proposed. An approach is adopted in which the algorithm is first represented in the form of a dataflow graph and then mapped onto the specialized processor array. The processors in this array execute the operations included in the corresponding nodes (or subsets of nodes) of the dataflow graph, while regular interconnections of these elements serve as edges of the graph. To speed up the execution, the proposed array allows the generation of computation fronts and their cancellation at a later time, depending on the arriving data operands; thus it is called a data-driven array. The structure of the basic cell and its programming are examined. Some design details are presented for two selected blocks, the instruction memory and the flag array. A scheme for mapping a dataflow graph (program) onto a hexagonally connected array is described and analyzed. Two distinct performance measures-mapping efficiency and array utilization-and some performance results are discussed 相似文献

19.

Correct translation of data parallel assignment onto array processors

J. P. Wray A. Stewart 《Formal Aspects of Computing》1994,6(4):417-439

相似文献

20.

具有高效缓冲策略的运动估计阵列处理器结构

苏睿刘贵忠张彤宇《计算机学报》2006,29(10):1772-1779

基于改进的线性处理器阵列,提出了一种用于全搜索运动估计的阵列处理器结构,它可以并行执行运算而只要求串行的数据输入.分析表明这种结构不仅执行效率高,而且内部缓冲区很小.由于其简单的结构和规则的数据流,它可以方便地在FPGA器件中实现,用作实时编码器的协处理器. 相似文献