期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Architecture and operating system support for two-dimensional runtime partial reconfiguration

Fei Wang Jack Jean 《The Journal of supercomputing》2012,59(2):610-635

Reconfigurable machines based on field programmable gate array (FPGA) chips adapt to applications’ needs through hardware reconfiguration. Partial reconfiguration allows the configuration of a portion of a chip while the rest of the chip is busy working on tasks. This paper considers a two-dimensional partially reconfigurable FPGA chip that allows the dynamic swap in and out of circuit modules. Such a chip supports the concurrent execution of multiple applications or an application that is otherwise too large to fit. A challenging issue for 2-D runtime partial reconfiguration is how to support the efficient connection, or routing, between circuit modules or between modules and I/O pins, when those modules may be placed on any area of a chip. Because commercial chips are not efficient in 2-D runtime routing, a new FPGA architecture is proposed based on an array of clusters of configurable logic blocks and a mesh of segmented buses. To evaluate the runtime performance of the architecture, an operating system is specified and implemented which takes care of the scheduling, placement, and routing of circuits on the architecture. Simulation is used to evaluate the efficiency of the OS kernel and to determine the optimal cluster size of the architecture. 相似文献

2.

A reconfigurable HexCell-based systolic array architecture for evolvable hardware on FPGA

《Microprocessors and Microsystems》2020

Evolvable hardware is a system that modifies its architecture and behavior to adapt with changes of the environment. It is formed by reconfigurable processing elements driven by an evolutionary algorithm. In this paper, we study a reconfigurable HexCell-based systolic array architecture for evolvable systems on FPGA. HexCell is a processing element with a tile-able hexagonal-shaped cell for reconfigurable systolic arrays on FPGAs. The cell has three input ports feed into an internal functional-unit connected to three output ports. The functional-unit is configured using dynamic partial reconfiguration (DPR), and the output ports, in contrast, are configured using virtual reconfiguration circuit (VRC). Our proposed architecture combines the merits of both DPR and VRC to achieve fast reconfiguration and accelerated evolution. A HexCell-based 4 × 4 array was implemented on FPGA and utilized 32.5% look-up tables, 31.3% registers, and 1.4% block RAMs of Artix-7 (XC7Z020) while same-size conventional array consumed 8.7%, 5.1%, and 20.7% of the same FPGA, respectively. As a case study, we used an adaptive image filter as a test application. Results showed that the fitness of the best filters generated by our proposed architecture were generally fitter than those generated by the conventional state-of-the-art systolic array on the selected application. Also, performing 900,000 evaluations on HexCell array was 2.6 × faster than the conventional one. 相似文献

3.

基于JBits的一种可重构数据处理系统可靠性研究 总被引：1，自引：0，他引：1

任小西李仁发金声震张克环吴强《计算机研究与发展》2007,44(4):722-728

空间太阳望远镜(SST)是一颗对太阳进行观测的科学卫星,它使用FPGA芯片对每天采集的大量数据进行预处理.高昂的建造费用和恶劣的工作环境,确保SST数据的高可靠性成为一项艰巨任务.改进了常规TMR结构,提出一种基于配置数据的可重构硬件故障检测和修复方法,使用JBits工具简化对配置数据的各种操作.此结构和方法能及时检测到故障,通过硬件重构消除故障,提高系统可靠性.采用Markov过程理论对系统可靠性进行分析,结果表明可靠性可得到显著提高. 相似文献

4.

A task graph execution manager for reconfigurable multi-tasking systems

Juan Antonio Clemente Carlos González Javier Resano Daniel Mozos 《Microprocessors and Microsystems》2010,34(2-4):73-83

Reconfigurable hardware can be used to build multi-tasking systems that dynamically adapt themselves to the requirements of the running applications. This is especially useful in embedded systems, since the available resources are very limited and the reconfigurable hardware can be reused for different applications. In these systems computations are frequently represented as task graphs that are executed taking into account their internal dependencies and the task schedule. The management of the task graph execution is critical for the system performance. In this regard, we have developed two different versions, a software module and a hardware architecture, of a generic task graph execution manager for reconfigurable multi-tasking systems. The second version reduces the run-time management overheads by almost two orders of magnitude. Hence it is especially suitable for systems with exigent timing constraints. Both versions include specific support to optimize the reconfiguration process. 相似文献

5.

可重构密码处理结构中重构机制及对性能的影响

姜晶菲倪晓强张民选《计算机工程与应用》2004,40(20):14-16,19

重构机制对可重构密码处理系统的性能有着重要的影响,该文从全局、局部、静态、动态几方面提出了流水化可重构密码处理结构中重构机制的分类,给出了各种机制的吞吐率和延迟公式,并分析了几种机制的性能和实现代价,最后给出了在采用局部动态重构机制的可重构密码处理结构中密码处理的性能。相似文献

6.

Architecture design of variable block size motion estimation for full and fast search algorithms in H.264/AVC

Xuanxing Xiong Author VitaeYang Song Author Vitae Ali Akoglu^{Author Vitae} 《Computers & Electrical Engineering》2011,37(3):285-299

Fast search algorithms (FSA) used for variable block size motion estimation follow irregular search (data access) patterns. This poses as the main challenge in designing hardware architectures for them. In this study, we build a baseline architecture for fast search algorithms using state-of-the-art components available in academia. We improve its performance by introducing: (1) a super 2-dimensional (2-D) random access memory architecture for reading regular and interleaved two-rows or two-columns as opposed to one-row or one-column accessibility of the state of the art; (2) a 2-D processing element array with a tuned interconnect to support neighborhood connections required by the conventional fast search algorithms and to exploit on-chip data reuse. Results show that our design increases system throughput by up to 85.47%, and achieves power reduction by up to 13.83% with an area increase in the worst case by up to 65.53% compared to the baseline architecture. 相似文献

7.

一种基于块匹配算法的SAD运算加速器

谷会涛陈书明《计算机工程与科学》2012,34(7):60-64

基于块匹配算法的运动估计是图像和视频应用中的关键技术。SAD运算是运动估计中最主要的运算形式,具有极高的计算复杂度和传输带宽需求。本文提出了一种可配置的SAD运算加速器结构,采用一个16×1规模的PE阵列和一个加法树结构加速SAD运算的执行。本文将PE阵列和加法树结构的流水线进行细致划分,有效提高了工作频率。加速器采用DMA事件机制,大部分的数据传输可以与SAD计算并行进行,减少了数据传输延迟引起的性能下降。实验结果显示,搜索16×16大小的搜索窗口,本文结构只需要4102个周期。基于SMIC0.13μm的CMOS标准单元工艺对本文结构进行综合,最高工作频率可达到750MHz,面积约为16.8k门和3.5KB的片上存储器。相似文献

8.

An instruction-level distributed processor for symmetric-key cryptography

Elbirt A.J. Paar C. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(5):468-480

Efficient implementation of block ciphers is critical toward achieving both high security and high-speed processing. Numerous block ciphers have been proposed and implemented, using a wide and varied range of functional operations. Existing architectures such as microcontrollers do not provide this broad range of support. Therefore, we will present a hardware architecture that achieves efficient block cipher implementation while maintaining flexibility through reconfiguration. In an effort to achieve such a hardware architecture, a study of a wide range of block ciphers was undertaken to develop an understanding of the functional requirements of each algorithm. This study led to the development of COBRA, a reconfigurable architecture for the efficient implementation of block ciphers. A detailed discussion of the top-level architecture, interconnection scheme, and underlying elements of the architecture will be provided. System configuration and on-the-fly reconfiguration will be analyzed, and from this analysis, it will be demonstrated that the COBRA architecture satisfies the requirements for achieving efficient implementation of a wide range of block ciphers that meet the 622 Mbps ATM network encryption throughput requirement. 相似文献

9.

动态重构硬件加速中的性能开销建模

下载免费PDF全文

苑福利宫磊娄文启陈香兰《计算机工程与应用》2022,58(6):69-79

近年来,随着可重构计算方法和可重构硬件特性的不断演进,基于FPGA动态部分重构技术构建运行时可重构加速器已经成为解决传统加速器设计中硬件资源限制问题的重要途径.然而,区别于传统静态重构加速器,FPGA的动态重构开销是影响硬件加速整体性能的重要因素,而目前尚缺少能够在可重构硬件设计的早期阶段进行动态重构开销精确估算的相关... 相似文献

10.

Real-time embedded systems powered by FPGA dynamic partial self-reconfiguration: a case study oriented to biometric recognition applications

Francisco Fons Mariano Fons Enrique Cantó Mariano López 《Journal of Real-Time Image Processing》2013,8(3):229-251

This work aims to pave the way for an efficient open system architecture applied to embedded electronic applications to manage the processing of computationally complex algorithms at real-time and low-cost. The target is to define a standard architecture able to enhance the performance-cost trade-off delivered by other alternatives nowadays in the market like general-purpose multi-core processors. Our approach, sustained by hardware/software (HW/SW) co-design and run-time reconfigurable computing, is synthesizable in SRAM-based programmable logic. As proof-of-concept, a run-time partially reconfigurable field-programmable gate array (FPGA) is addressed to carry out a specific application of high-demanding computational power such as an automatic fingerprint authentication system (AFAS). Biometric personal recognition is a good example of compute-intensive algorithm composed of a series of image processing tasks executed in a sequential order. In our pioneer conception, these tasks are partitioned and synthesized first in a series of coprocessors that are then instantiated and executed multiplexed in time on a partially reconfigurable region of the FPGA. The implementation benchmark of the AFAS either as a pure software approach on a PC platform under a dual-core processor (Intel Core 2 Duo T5600 at 1.83 GHz) or as a reconfigurable FPGA co-design (identical algorithm partitioned in HW/SW tasks operating at 50 or 100 MHz on the second smallest device of the Xilinx Virtex-4 LX family) highlights a speed-up of one order of magnitude in favor of the FPGA alternative. These results let point out biometric recognition as a sensible killer application for run-time reconfigurable computing, mainly in terms of efficiently balancing computational power, functional flexibility and cost. Such features, reached through partial reconfiguration, are easily portable today to a broad range of embedded applications with identical system architecture. 相似文献

11.

基于CAN总线的可重构SHM系统设计

下载免费PDF全文

郑伟崔荣荣路萍《计算机工程》2010,36(15):228-229,232

针对结构健康监测(SHM)系统现场节点功能固化,不便于机动配置及后期维护等问题,提出一种基于CAN总线的可重构SHM系统架构,设计节点功能重构、网络结构重构以及资源分配重构等技术。以功能适配接口及嵌入式操作系统的软硬件协同实现节点功能重构,以自组织特征映射网实现网络结构重构的优先级聚类,以基于组件对象模型的上位监控软件实现资源的按需分配。利用该方法设计的系统具有灵活、高效和一定自主性等特点。相似文献

12.

高速可配置RS纠删解码器的VLSI设计

下载免费PDF全文

侯大志张孝双蒋洪晖《计算机工程》2011,37(22):215-218

目前对可配置纠错与删除(纠删)解码器研究较少。为此,采用性能优异的RS编码方法,提出一种高速可配置RS纠删解码器的超大规模集成电路(VLSI)架构,并详述可配置纠删BM模块的构成。该架构通过折叠技术,使解码器在保证高速的前提下降低硬件复杂度。通过0.18 μm工艺和Design Complier工具综合测试结果表明,与同类解码器研究相比,该解码器在硬件复杂度吞吐率和可配置性方面,均具有较大优势。相似文献

13.

Multi-level reconfigurable architectures in the switch model

Sebastian Lange Martin Middendorf 《Journal of Systems Architecture》2010,56(2-3):103-115

In this paper, we propose a concept for multi-level reconfigurable architectures with more than two levels of reconfiguration, and study these architectures theoretically and experimentally. The proposed architectures are extensions of 2-level reconfigurable architectures where the reconfiguration operations on the lowest level correspond to the reconfiguration operations of standard 1-level reconfigurable architectures, and the reconfigurable units are simple switches. It is shown that finding an optimal number of reconfiguration levels and a corresponding reconfiguration scheme that minimizes the number of reconfiguration bits for a given algorithm can be done in polynomial time. But finding the optimal number of reconfiguration levels is NP-hard for heterogeneous multi-level architectures, where the number of reconfiguration levels varies for the different reconfigurable units. Experimental results for different test applications show that 3–4 reconfiguration levels are optimal with respect to the number of reconfiguration bits needed. The number of reconfiguration bits is reduced by 35–86% compared to 1-level reconfiguration and by 8–34% compared to 2-level reconfiguration. The heterogeneous architecture reduces the number of necessary reconfiguration bits by additional 1–5% and also needs less SRAM cells. 相似文献

14.

On the practicality of using intrinsic reconfiguration for fault recovery

Greenwood G.W. 《Evolutionary Computation, IEEE Transactions on》2005,9(4):398-405

Evolvable hardware (EHW) combines the powerful search capability of evolutionary algorithms with the flexibility of reprogrammable devices, thereby providing a natural framework for reconfiguration. This framework has generated an interest in using EHW for fault-tolerant systems because reconfiguration can effectively deal with hardware faults whenever it is impossible to provide spares. But systems cannot tolerate faults indefinitely, which means reconfiguration does have a deadline. The focus of previous EHW research relating to fault-tolerance has been primarily restricted to restoring functionality, with no real consideration of time constraints. In this paper, we are concerned with EHW performing reconfiguration under deadline constraints. In particular, we investigate reconfigurable hardware that undergoes intrinsic evolution. We show that fault recovery done by intrinsic reconfiguration has some restrictions, which designers cannot ignore. 相似文献

15.

Resource management and task partitioning and scheduling on a run-time reconfigurable embedded system

Radha Guha^{Author Vitae} Nader Bagherzadeh Author Vitae Author Vitae 《Computers & Electrical Engineering》2009,35(2):258-285

There are many design challenges in the hardware-software co-design approach for performance improvement of data-intensive streaming applications with a general-purpose microprocessor and a hardware accelerator. These design challenges are mainly to prevent hardware area fragmentation to increase resource utilization, to reduce hardware reconfiguration cost and to partition and schedule the tasks between the microprocessor and the hardware accelerator efficiently for performance improvement and power savings of the applications.In this paper a modular and block based hardware configuration architecture named memory-aware run-time reconfigurable embedded system (MARTRES) is proposed for efficient resource management and performance improvement of streaming applications. Subsequently we design a task placement algorithm named hierarchical best fit ascending (HBFA) algorithm to prove that MARTRES configuration architecture is very efficient in increased resource utilization and flexible in task mapping and power savings. The time complexity of HBFA algorithm is reduced to O(n) compared to traditional Best Fit (BF) algorithm’s time complexity of O(n²), when the quality of the placement solution by HBFA is better than that of BF algorithm. Finally we design an efficient task partitioning and scheduling algorithm named balanced partitioned and placement-aware partitioning and scheduling algorithm (BPASA). In BPASA we exploit the temporal parallelism in streaming applications to reduce reconfiguration cost of the hardware, while keeping in mind the required throughput of the output data. We balance the exploitation of spatial parallelism and temporal parallelism in streaming applications by considering the reconfiguration cost vs. the data transfer cost. The scheduler refers to the HBFA placement algorithm to check whether contiguous area on FPGA is available before scheduling the task for HW or for SW. 相似文献

16.

HEVC中率失真优化算法的动态可重构实现

杨坤蒋林谢晓燕邓军勇刘新闯胡传瞻《计算机工程与科学》2021,43(2):354-361

基于视频阵列处理器高效视频编码HEVC实现中,HEVC灵活的编码块增加了率失真优化算法硬件实现的难度,难以实现阵列规模和不同块的灵活切换.针对这一问题,提出一种动态可重构的率失真优化实现方法.基于上下文切换的动态重构机制,完成不同规模、不同块大小算法之间的灵活切换,并以率失真优化算法作为帧内模式选择的判别依据,实现帧内... 相似文献

17.

可重构资源管理及硬件任务布局的算法研究 总被引：1，自引：0，他引：1

李涛杨愚鲁《计算机研究与发展》2008,45(2):375-382

可重构系统具有微处理器的灵活性和接近于ASIC的计算速度,可重构硬件的动态部分重构能力能够实现计算和重构操作的重叠,使系统能够动态地改变运行任务,可重构资源管理和硬件任务布局方法是提高可重构系统性能的关键.提出了基于任务上边界计算最大空闲矩形的算法(TT-KAMER),能够有效地管理系统的空闲可重构资源;在此基础上使用FF和启发式BF算法进行硬件任务的布局.实验表明,算法能够有效地实现在线资源分配与任务布局,获得较高的资源利用率. 相似文献

18.

Mobile satellite reception with a virtual satellite dish based on a reconfigurable multi-processor architecture

M.D. van de Burgwal K.C. Rovers K.C.H. Blom A.B.J. Kokkeler G.J.M. SmitAuthor vitae 《Microprocessors and Microsystems》2011,35(8):716-728

Traditionally, mechanically steered dishes or analog phased array beamforming systems have been used for radio frequency receivers, where strong directivity and high performance were much more important than low-cost requirements. Real-time controlled digital phased array beamforming could not be realized due to the high computational requirements and the implementation costs. Today, digital hardware has become powerful enough to perform the massive number of operations required for real-time digital beamforming. With the continuously decreasing price per transistor, high performance signal processing has become available by using multi-processor architectures. More and more applications are using beamforming to improve the spatial utilization of communication channels, resulting in many dedicated digital architectures for specific applications. By using a reconfigurable architecture, a single hardware platform can be used for different applications with different processing needs.In this article, we show how a reconfigurable multi-processor system-on-chip based architecture can be used for phased array processing, including an advanced tracking mechanism to continuously receive signals with a mobile satellite receiver. An adaptive beamformer for DVB-S satellite reception is presented that uses an Extended Constant Modulus Algorithm to track satellites. The receiver consists of 8 antennas and is mapped on three reconfigurable Montium TP processors. With a scenario based on a phased array antenna mounted on the roof of a car, we show that the adaptive steering algorithm is robust in dynamic scenarios and correctly demodulates the received signal. 相似文献

19.

UML-based hardware/software co-design platform for dynamically partially reconfigurable network security systems

Chun-Hsian Huang Pao-Ann Hsiung Jih-Sheng Shen 《Journal of Systems Architecture》2010,56(2-3):88-102

The dynamic partial reconfiguration technology of FPGA has made it possible to adapt system functionalities at run-time to changing environment conditions. However, this new dimension of dynamic hardware reconfigurability has rendered existing CAD tools and platforms incapable of efficiently exploring the design space. As a solution, we proposed a novel UML-based hardware/software co-design platform (UCoP) targeting at dynamically partially reconfigurable network security systems (DPRNSS). Computation-intensive network security functions, implemented as reconfigurable hardware functions, can be configured on-demand into a DPRNSS at run-time. Thus, UCoP not only supports dynamic adaptation to different environment conditions, but also increases hardware resource utilization. UCoP supports design space exploration for reconfigurable systems in three folds. Firstly, it provides reusable models of typical reconfigurable systems that can be customized according to user applications. Secondly, UCoP provides a partially reconfigurable hardware task template, using which users can focus on their hardware designs without going through the full partial reconfiguration flow. Thirdly, UCoP provides direct interactions between UML system models and real reconfigurable hardware modules, thus allowing accurate time measurements. Compared to the existing lower-bound and synthesis-based estimation methods, the accurate time measurements using UCoP at a high abstraction level can more efficiently reduce the system development efforts. 相似文献

20.

知识可编程智能芯片系统:概念、架构与展望

张俊王飞跃《模式识别与人工智能》2018,31(10):869-876

本文原创性地提出知识可编程智能芯片系统(KPI-CS)及其理论和工程体系.该系统在当前最先进的异构计算和可重构人工智能(AI)芯片技术的基础上,深度融合复杂系统工程理论、知识工程理论与技术、半导体芯片研发技术、人工智能可重构算法技术,提出基于知识的可重构智能芯片和计算系统平台技术.该系统旨在支持AI应用场景适应性、AI系统重构灵活性、AI算法算力合理性的平行智能AI芯片系统平台和对应的知识服务平台.同时,作为应用展望,KPI-CS与相应的应用平台联动,为平行复杂系统管理与控制、智能交通、智能能源、平行区块链、智能医疗等研究领域和工程实践提供新一代的实时、高效、自适应的计算系统支撑. 相似文献