期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A method for partitioning applications in hybrid reconfigurable architectures

Michalis?D.?Galanis Email author Athanasios?Milidonis George?Theodoridis Dimitrios?Soudris Costas?E.?Goutis 《Design Automation for Embedded Systems》2005,10(1):27-47

In this paper, we propose a methodology for accelerating application segments by partitioning them between reconfigurable hardware blocks of different granularity. Critical parts are speeded-up on the coarse-grain reconfigurable hardware for meeting the timing requirements of application code mapped on the reconfigurable logic. The reconfigurable processing units are embedded in a generic hybrid system architecture which can model a large number of existing heterogeneous reconfigurable platforms. The fine-grain reconfigurable logic is realized by an FPGA unit, while the coarse-grain reconfigurable hardware by our developed high-performance data-path. The methodology mainly consists of three stages; the analysis, the mapping of the application parts onto fine and coarse-grain reconfigurable hardware, and the partitioning engine. A prototype software framework realizes the partitioning flow. In this work, the methodology is validated using five real-life applications. Analytical partitioning experiments show that the speedup relative to the all-FPGA mapping solution ranges from 1.5 to 4.0, while the specified timing constraints are satisfied for all the applications. 相似文献

2.

Dynamic Reconfiguration Technologies Based on FPGA in Software Defined Radio System

Ke He Louise Crockett Robert Stewart 《Journal of Signal Processing Systems》2012,69(1):75-85

Partial Reconfiguration (PR) is a method for Field Programmable Gate Array (FPGA) designs which allows multiple applications to time-share a portion of an FPGA while the rest of the device continues to operate unaffected. Using this strategy, the physical layer processing architecture in Software Defined Radio (SDR) systems can benefit from reduced complexity and increased design flexibility, as different waveform applications can be grouped into one part of a single FPGA. Waveform switching often means not only changing functionality, but also changing the FPGA clock frequency. However, that is beyond the current functionality of PR processes as the clock components (such as Digital Clock Managers (DCMs)) are excluded from the process of partial reconfiguration. In this paper, we present a novel architecture that combines another reconfigurable technology, Dynamic Reconfigurable Port (DRP), with PR based on a single FPGA in order to dynamically change both functionality and also the clock frequency. The architecture is demonstrated to reduce hardware utilization significantly compared with standard, static FPGA design. 相似文献

3.

一种高性能并行计算架构的FPGA实现

钟瑜吴明钦《电讯技术》2019,59(7):829-835

针对传统的现场可编程门阵列（Field Programmable Gate Array,FPGA）开发方法效率低、不能充分利用芯片逻辑资源等问题,提出了一种高性能并行计算架构。设计了统一的软件、硬件编程模型,并提供FPGA操作系统层级的支持,将部分可重构技术应用于硬件线程的开发,使该架构具备资源管理和复用的能力。同时还设计了软件、硬件协同开发的流程。在开发板ZC702上进行了设计验证,评估了架构的额外资源消耗情况,并以排序算法为例展示了该架构多线程设计的灵活性。相似文献

4.

面向密码算法的异步可重构结构设计

熊华沈海斌季爱明潘雪增《微电子学与计算机》2005,22(3):170-173,177

针对FPGA和ASIC在实现密码算法时的不足之处，本文介绍了一种面向密码算法的异步可重构结构。该结构的运算功能由一个可重构单元阵列提供，数据通路由可重构单元之间的相互连接实现，异步通信采用握手信号完成。在分析握手信号传输延时对可重构结构的影响后，文章提出了一种适合该结构的单元信号传输握手控制电路。同时在单元结构中，使用改进的DSDCVS逻辑来设计其运算电路，减小了单元的面积，提高了单元的工作速度。应用实例表明，在实现密码算法时，面向密码算法的异步可重构结构表现出了比FPGA更好的性能。相似文献

5.

一种可重构的快速有限域乘法结构 总被引：1，自引：0，他引：1

袁丹寿戎蒙恬《电子与信息学报》2006,28(4):717-720

在一种改进的串行乘法器的基础上,提出了一种可重构的快速有限域GF (2m )(1＜mM)乘法器结构。利用一组配置信号和逻辑电路来改变有限域的度m,使得乘法器可以重构和编程。同时采用门控时钟减小电路功耗。该乘法器结构具有可重构性、高灵活性和低电路复杂性等特点。与传统的移位乘法器相比,它将乘法器速度提高一倍。这种乘法器适合于变有限域,低硬件复杂度的高性能加密算法的VLSI设计。相似文献

6.

A design flow for speeding-up dsp applications in heterogeneous reconfigurable systems

Michalis D. Galanis Athanassios Milidonis Athanassios P. Kakarountas Costas E. Goutis 《Microelectronics Journal》2006,37(6):554-564

In this paper, we propose a method for speeding-up Digital Signal Processing applications by partitioning them between the reconfigurable hardware blocks of different granularity and mapping critical parts of applications on coarse-grain reconfigurable hardware. The reconfigurable hardware blocks are embedded in a heterogeneous reconfigurable system architecture. The fine-grain part is implemented by an embedded FPGA unit, while for the coarse-grain reconfigurable hardware our developed high-performance coarse-grain data-path is used. The design flow mainly consists of three steps; the analysis procedure, the mapping onto coarse-grain blocks, and the mapping onto the fine-grain hardware. In this work, the methodology is validated using five real-life applications; an OFDM transmitter, a medical imaging technique, a wavelet-based image compressor, a video compression scheme and a JPEG encoder. The experimental results show that the speedup, relative to an all-FPGA solution, ranges from 1.55 to 4.17 for the considered applications. 相似文献

7.

A New FPGA for DSP Applications Integrating BIST Capabilities

Alex Gonsales Marcelo Lubaszewski Luigi Carro Michel Renovell 《Journal of Electronic Testing》2004,20(4):423-431

This work proposes a new FPGA architecture, to meet the requirements of signal processing and testing of current system-on-chip designs. The proposed architecture provides the hardware reuse and the reconfigurability advantages of an FPGA, not only for the system functionality, but also for the system testing, while keeping the performance level required by current signal processing applications. This paper presents the new FPGA model, along with preliminary experimental results that clearly show the possible advantages at the system level of merging design and test in a reconfigurable device. 相似文献

8.

Deployment of Run-Time Reconfigurable Hardware Coprocessors Into Compute-Intensive Embedded Applications

Francisco Fons Mariano Fons Enrique Cantó Mariano López 《Journal of Signal Processing Systems》2012,66(2):191-221

Day after day, embedded systems add more compute-intensive applications inside their end products: cryptography or image and video processing are some examples found in leading markets like consumer electronics and automotive. To face up these ever-increasing computational demands, the use of hardware accelerators synthesized in field-programmable gate arrays (FPGA) lets achieve processing speedups of orders of magnitude versus their counterpart CPU-based software approaches. However, the inherent increment in physical resources penalizes in cost. To address this issue, dynamically reconfigurable hardware technology definitively reached its maturity. SRAM-based reconfigurable logic goes beyond the classical conception of static hardware resources distributed in space and held invariant for the entire application life cycle; it provides a new design abstraction featured by the temporal partitioning of such resources to promote their continuous reuse, reconfiguring them on the fly to play a different role in each instant. This new computing paradigm lets balance the design of embedded applications by partitioning their functionality in space and time—through a series of mutually-exclusive processing tasks synthesized multiplexed in time on the same set of resources—and achieving thus cost savings in both area and power metrics. However, the exploitation of this system versatility requires special attention to avoid performance degradation. Such technical aspects are addressed in this work intended to be a survey on reconfigurable hardware technology and aimed at defining an open, standard and cost-effective system architecture driven by flexible coprocessors instantiated on demand on reconfigurable resources of an FPGA. This concept fits well with the functional features demanded to many embedded applications today and its feasibility has been proved with a state-of-the-art commercial SRAM-based FPGA platform. The achieved results highlight dynamic partial reconfiguration as a potential technology to lead the next computing wave in the industry. 相似文献

9.

Medusa: a scalable MR console using USB 总被引：1，自引：0，他引：1

Stang PP Conolly SM Santos JM Pauly JM Scott GC 《IEEE transactions on medical imaging》2012,31(2):370-379

Magnetic resonance imaging (MRI) pulse sequence consoles typically employ closed proprietary hardware, software, and interfaces, making difficult any adaptation for innovative experimental technology. Yet MRI systems research is trending to higher channel count receivers, transmitters, gradient/shims, and unique interfaces for interventional applications. Customized console designs are now feasible for researchers with modern electronic components, but high data rates, synchronization, scalability, and cost present important challenges. Implementing large multichannel MR systems with efficiency and flexibility requires a scalable modular architecture. With Medusa, we propose an open system architecture using the universal serial bus (USB) for scalability, combined with distributed processing and buffering to address the high data rates and strict synchronization required by multichannel MRI. Medusa uses a modular design concept based on digital synthesizer, receiver, and gradient blocks, in conjunction with fast programmable logic for sampling and synchronization. Medusa is a form of synthetic instrument, being reconfigurable for a variety of medical/scientific instrumentation needs. The Medusa distributed architecture, scalability, and data bandwidth limits are presented, and its flexibility is demonstrated in a variety of novel MRI applications. 相似文献

10.

VLSI Implementation of H.264 Video Decoder for Mobile Multimedia Application

Seong Mo Park Miyoung Lee Seungchul Kim Kyoung‐Seon Shin Igkyun Kim Hanjin Cho Heebum Jung Dukdong Lee 《ETRI Journal》2006,28(4):525-528

In this letter, we present a design of a single chip video decoder called advanced mobile video ASIC (A‐MoVa) for mobile multimedia applications. This chip uses a mixed hardware/software architecture to improve both its performance and its flexibility. We designed the chip using a partition between the hardware and software blocks, and developed the architecture of an H.264 decoder based on the system‐on‐a‐chip (SoC) platform. This chip contains 290,000 logic gates, 670,000 memory gates, and its size is 7.5 mm×7.5 mm (using 0.25 micron 4‐layers metal CMOS technology). 相似文献

11.

PROTEUS-Lite project: dedicated to developing atelecommunication-oriented FPGA and its applications

Miyazaki T. Takahara A. Murooka T. Katayama M. Ichimori T. Shirakawa K. Tsutsui A. Fukami K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(4):401-414

This paper describes a project dedicated to developing an improved (in terms of usability) version of our previous telecommunication-oriented field programmable gate array (FPGA), and its applications. To achieve this goal, we adopt several challenging design strategies. First, we determine the new FPGA architecture based on a quantitative evaluation carried out to optimize the interaction between the FPGA and CAD algorithms. In addition, we create a new chip design environment that allows semi-automatic test pattern generation and cross-checking between logic and layout design. Furthermore, a dedicated CAD system is developed based on a consideration of the evaluation results and the characteristics of the FPGA. As a result of these design strategies, the FPGA and CAD system are well-balanced, and even though the FPGA has very rich routing resources, the routing process can be finished quickly without sacrificing application-circuit performance. The FPGA is applied to several reconfigurable systems for telecommunications, and is found to offer the required functions and good performance 相似文献

12.

Floating-Point FPGA: Architecture and Modeling

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(12):1709-1718

This paper presents an architecture for a reconfigurable device that is specifically optimized for floating-point applications. Fine-grained units are used for implementing control logic and bit-oriented operations, while parameterized and reconfigurable word-based coarse-grained units incorporating word-oriented lookup tables and floating-point operations are used to implement datapaths. In order to facilitate comparison with existing FPGA devices, the virtual embedded block scheme is proposed to model embedded blocks using existing field-programmable gate array (FPGA) tools. This methodology involves adopting existing FPGA resources to model the size, position, and delay of the embedded elements. The standard design flow offered by FPGA and computer-aided design vendors is then applied and static timing analysis can be used to estimate the performance of the FPGA with the embedded blocks. On selected floating-point benchmark circuits, our results indicate that the proposed architecture can achieve four times improvement in speed and 25 times reduction in area compared with a traditional FPGA device. 相似文献

13.

Online Fault Tolerance for FPGA Logic Blocks

Emmert J.M. Stroud C.E. Abramovici M. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(2):216-226

Most adaptive computing systems use reconfigurable hardware in the form of field programmable gate arrays (FPGAs). For these systems to be fielded in harsh environments where high reliability and availability are a must, the applications running on the FPGAs must tolerate hardware faults that may occur during the lifetime of the system. In this paper, we present new fault-tolerant techniques for FPGA logic blocks, developed as part of the roving self-test areas (STARs) approach to online testing, diagnosis, and reconfiguration . Our techniques can handle large numbers of faults (we show tolerance of over 100 logic faults via actual implementation on an FPGA consisting of a 20 times 20 array of logic blocks). A key novel feature is the reuse of defective logic blocks to increase the number of effective spares and extend the mission life. To increase fault tolerance, we not only use nonfaulty parts of defective or partially faulty logic blocks, but we also use faulty parts of defective logic blocks in nonfaulty modes. By using and reusing faulty resources, our multilevel approach extends the number of tolerable faults beyond the number of currently available spare logic resources. Unlike many column, row, or tile-based methods, our multilevel approach can tolerate not only faults that are evenly distributed over the logic area, but also clusters of faults in the same local area. Furthermore, system operation is not interrupted for fault diagnosis or for computing fault-bypassing configurations. Our fault tolerance techniques have been implemented using ORCA 2C series FPGAs which feature incremental dynamic runtime reconfiguration 相似文献

14.

一种基于ARM7TDMI的用户可重构SoC构架研究

黄嵩人虞致国魏敬和《电子器件》2008,31(3):1054-1057

提出了一种基于ARM7TDMI嵌入式内核用户可重构系统芯片构架.该架构由ARM系统的固定逻辑、重构控制模块、数据总线接口控制模块组成,相对于专用系统芯片或FPGA,该架构表现出很大的突出的灵活性、高效性.最后还对用户可重构芯片的集成开发环境及联合仿真等方面进行了详细的论述. 相似文献

15.

Object-oriented domain specific compilers for programming FPGAs

Mencer O. Platzner M. Morf M. Flynn M.J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(1):205-210

Simplifying the programming models is paramount to the success of reconfigurable computing with field programmable gate arrays (FPGAs). This paper presents a methodology to combine true object-oriented design of the compiler/CAD tool with an object-oriented hardware design methodology in C++. The resulting system provides all the benefits of object-oriented design to the compiler/CAD tool designer and to the hardware designer/programmer. The two examples for domain-specific compilers presented are BSAT and StReAm. Each domain-specific compiler is targeted at a very specific application domain, such as applications that accelerate Boolean satisfiability problems with BSAT, and applications which lend themselves for implementation as a stream architecture with StReAm. The key benefit of the presented domain specific compilers is a reduction of design time by orders of magnitude while keeping the optimal performance of hand-designed circuits 相似文献

16.

Optimization of Boundary Scan Tests Using FPGA-Based Efficient Scan Architectures

Igor Aleksejev Sergei Devadze Artur Jutman Konstantin Shibin 《Journal of Electronic Testing》2016,32(3):245-255

This paper presents a method for optimization of board-level scan test with the help of reconfigurable scan-chains (RSCs) implemented in a programmable logic of FPGA. Despite that the RSC concept is a well-known solution for scan-based test time reduction, the usage of RSC may lead to un-acceptable hardware overhead. In our work, we are targeting a completely new approach of exploiting on-board FPGA resources that being unconfigured are typically available during the manufacturing test phase for carrying out tests using temporarily implemented virtual RSC structures. As the allocated FPGA logic is re-claimed for functional use after the test is finished, the presented method delivers all the advantages of RSCs at no extra hardware cost. Experimental results show that the proposed virtual RSCs can fit into all available commercial FPGAs providing a significant test time reduction in comparison with state-of-the-art Boundary Scan test tecnique. 相似文献

17.

Medium-Grain Cells for Reconfigurable DSP Hardware

Myjak M.J. Delgado-Frias J.G. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(6):1255-1265

Reconfigurable hardware contains an array of programmable cells and interconnection structures. Field-programmable gate arrays use fine-grain cells that implement simple logic functions. Some proposed reconfigurable architectures for digital signal processing (DSP) use coarse-grain cells that perform 16-b or 32-b operations. A third alternative is to use medium-grain cells with a word length of 4 or 8 b. This approach combines high flexibility with inherent support for binary arithmetic such as multiplication. This paper presents two medium-grain cells for reconfigurable DSP hardware. Both cells contain an array of small lookup tables, or ldquoelementsrdquo, that can assume two structures. In memory mode, the elements act as a random-access memory. In mathematics mode, the elements implement 4-b arithmetic operations. The first design uses a matrix of 4 times 4 elements and operates in bit-parallel fashion. The second design uses an array of five elements and computes arithmetic functions in bit-serial fashion. Layout simulations in 180-nm CMOS indicate that the parallel cell operates at 267 MHz, whereas the serial cell runs at 167 MHz. However, the parallel design requires over twice the area. The proposed medium-grain cells provide the performance and flexibility needed to implement DSP. To evaluate the designs, the paper estimates the execution time and resource utilization for common benchmarks such as the fast Fourier transform. The architecture model used in this analysis combines the cells with a pipelined hierarchical interconnection network. The end results show great promise compared to other devices, including field-programmable gate arrays. 相似文献

18.

FPGA prototyping of a RISC processor core for embedded applications

Gschwind M. Salapura V. Maurer D. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(2):241-250

Application-specific processors offer an attractive option in the design of embedded systems by providing high performance for a specific application domain. In this work, we describe the use of a reconfigurable processor core based on an RISC architecture as starting point for application-specific processor design. By using a common base instruction set, development cost can be reduced and design space exploration is focused on the application-specific aspects of performance. An important aspect of deploying any new architecture is verification which usually requires lengthy software simulation of a design model. We show how hardware emulation based on programmable logic can be integrated into the hardware/software codesign flow. While previously hardware emulation required massive investment in design effort and special purpose emulators, an emulation approach based on high-density field-programmable gate array (FPGA) devices now makes hardware emulation practical and cost effective for embedded processor designs. To reduce development cost and avoid duplication of design effort, FPGA prototypes and ASIC implementations are derived from a common source: We show how to perform targeted optimizations to fully exploit the capabilities of the target technology while maintaining a common source base 相似文献

19.

基于FPGA硬件加密的设计与实现 总被引：1，自引：1，他引：0

武一郭婷婷《电视技术》2014,38(5)

以FPGA芯片Cyclone II系列为核心,构建FPGA硬件平台,提出一种以资源优先为目的的DES、AES加解密设计方案。通过分析S盒的非线性特征,构造新的复合域变换,避免因同构变换产生的资源损耗。加解密过程中利用轮函数硬件结构的复用,达到硬件资源占用的最小化。整体采用内嵌流水线结构,减少逻辑复杂度的同时提高处理速度。实验结果验证了FPGA硬件加密的资源占用率远低于ASIC的硬件加密,执行速度达到Gbit/s,加密性能大大提高。相似文献

20.

Reconfigurable Architecture for Network Flow Analysis

Yusuf S. Luk W. Sloman M. Dulay N. Lupu E.C. Brown G. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(1):57-65

This paper describes a reconfigurable architecture based on field-programmable gate-array (FPGA) technology for monitoring and analyzing network traffic at increasingly high network data rates. Our approach maps the performance-critical tasks of packet classification and flow monitoring into reconfigurable hardware, such that multiple flows can be processed in parallel. We explore the scalability of our system, showing that it can support flows at multi-gigabit rate; this is faster than most software-based solutions where acceptable data rates are typically no more than 100 million bits per second. 相似文献