期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

苑福利宫磊娄文启陈香兰《计算机工程与应用》2022,58(6):69-79

近年来,随着可重构计算方法和可重构硬件特性的不断演进,基于FPGA动态部分重构技术构建运行时可重构加速器已经成为解决传统加速器设计中硬件资源限制问题的重要途径.然而,区别于传统静态重构加速器,FPGA的动态重构开销是影响硬件加速整体性能的重要因素,而目前尚缺少能够在可重构硬件设计的早期阶段进行动态重构开销精确估算的相关... 相似文献

2.

A deadlock-free routing algorithm for dynamically reconfigurable Networks-on-Chip

Chris Jackson^{Author Vitae} Simon J. Hollis Author Vitae 《Microprocessors and Microsystems》2011,35(2):139-151

We address routing in Networks-On-Chip (NoC) architectures that use irregular mesh topologies with Long-Range Links (LRL). These topologies create difficult conditions for routing algorithms, as standard algorithms assume a static, regular link structure and exploit the uniformity of regular meshes to avoid deadlock and maintain routability. We present a novel routing algorithm that can cope with these irregular topologies and adapt to run-time LRL insertion and topology reconfiguration. Our approach to accommodate dynamic topology reconfiguration is to use a new technique that decomposes routing relations into two stages: the calculation of output ports on the current minimal path and the application of routing restrictions designed to prevent deadlock. In addition, we present a selection function that uses local topology data to adaptively select optimal paths.The routing algorithm is shown to be deadlock-free, after which an analysis of all possible routing decisions in the region of an LRL is carried out. We show that the routing algorithm minimises the cost of sub-optimally placed LRL and display the hop savings available. When applied to LRLs of less than seven hops, the overall traffic hop count and associated routing energy cost is reduced. In a simulated 8 × 8 network the total input buffer usage across the network was reduced by 6.5%. 相似文献

3.

Real-time embedded systems powered by FPGA dynamic partial self-reconfiguration: a case study oriented to biometric recognition applications

Francisco Fons Mariano Fons Enrique Cantó Mariano López 《Journal of Real-Time Image Processing》2013,8(3):229-251

This work aims to pave the way for an efficient open system architecture applied to embedded electronic applications to manage the processing of computationally complex algorithms at real-time and low-cost. The target is to define a standard architecture able to enhance the performance-cost trade-off delivered by other alternatives nowadays in the market like general-purpose multi-core processors. Our approach, sustained by hardware/software (HW/SW) co-design and run-time reconfigurable computing, is synthesizable in SRAM-based programmable logic. As proof-of-concept, a run-time partially reconfigurable field-programmable gate array (FPGA) is addressed to carry out a specific application of high-demanding computational power such as an automatic fingerprint authentication system (AFAS). Biometric personal recognition is a good example of compute-intensive algorithm composed of a series of image processing tasks executed in a sequential order. In our pioneer conception, these tasks are partitioned and synthesized first in a series of coprocessors that are then instantiated and executed multiplexed in time on a partially reconfigurable region of the FPGA. The implementation benchmark of the AFAS either as a pure software approach on a PC platform under a dual-core processor (Intel Core 2 Duo T5600 at 1.83 GHz) or as a reconfigurable FPGA co-design (identical algorithm partitioned in HW/SW tasks operating at 50 or 100 MHz on the second smallest device of the Xilinx Virtex-4 LX family) highlights a speed-up of one order of magnitude in favor of the FPGA alternative. These results let point out biometric recognition as a sensible killer application for run-time reconfigurable computing, mainly in terms of efficiently balancing computational power, functional flexibility and cost. Such features, reached through partial reconfiguration, are easily portable today to a broad range of embedded applications with identical system architecture. 相似文献

4.

Energy efficient processing of motion estimation for embedded multimedia systems

Jooheung Lee 《Multimedia Tools and Applications》2017,76(23):24749-24765

Visual sensor networks require low power compression techniques of large amount of video data in each camera node due to the energy-constrained and bandwidth-limited environments. In this paper, energy-efficient architecture for Variable Block Size Motion Estimation is proposed to fully utilize dynamic partial reconfiguration capability of programmable hardware fabric in distributed embedded vision processing nodes. Partial reconfiguration of FPGA is exploited to support run-time reconfiguration of the proposed modular hardware architecture for motion estimation. According to the required search range, hardware reconfiguration is performed adaptively to reduce the hardware resources and power consumption. A reconfigurable ME ranging from simple 1-D to a complex 2-D Sum of Absolute Differences (SAD) array to perform full search block matching is selected in order to support different search window size. The implemented scalable SAD array can provide different resolutions and frame rates for real time applications with multiple reconfigurable regions. 相似文献

5.

Design space exploration for partially reconfigurable architectures in real-time systems

《Journal of Systems Architecture》2013,59(8):571-581

In this paper, we introduce FoRTReSS (Flow for Reconfigurable archiTectures in Real-time SystemS), a methodology for the generation of partially reconfigurable architectures with real-time constraints, enabling Design Space Exploration (DSE) at the early stages of the development. FoRTReSS can be completely integrated into existing partial reconfiguration flows to generate physical constraints describing the architecture in terms of reconfigurable regions that are used to floorplan the design, with key metrics such as partially reconfigurable area, real-time or external fragmentation. The flow is based upon our SystemC simulator for real-time systems that helps develop and validate scheduling algorithms with respect to application timing constraints and partial reconfiguration physical behaviour. We tested our approach with a video stream encryption/decryption application together with Error Correcting Code and showed that partial reconfiguration may lead to an area improvement up to 38% on some resources without compromising application performance, in a very small amount of time: less than 30 s. 相似文献

6.

基于FPGA的动态可重配置短波收发机

刘彬赵明生《电子技术应用》2009,35(10)

使用基于模块化的动态部分重配置技术,构建了基于FPGA的动态可重配置软件无线电系统平台,并在该平台上设计了动态可重配置MIL-STD-188-110B短波收发机系统。与传统的全局静态重配置系统相比,动态可重配置系统扩展性好,配置速度快,用于存储配置比特流所需的空间较少,配置控制方式比较灵活。相似文献

7.

UML-based hardware/software co-design platform for dynamically partially reconfigurable network security systems

Chun-Hsian Huang Pao-Ann Hsiung Jih-Sheng Shen 《Journal of Systems Architecture》2010,56(2-3):88-102

The dynamic partial reconfiguration technology of FPGA has made it possible to adapt system functionalities at run-time to changing environment conditions. However, this new dimension of dynamic hardware reconfigurability has rendered existing CAD tools and platforms incapable of efficiently exploring the design space. As a solution, we proposed a novel UML-based hardware/software co-design platform (UCoP) targeting at dynamically partially reconfigurable network security systems (DPRNSS). Computation-intensive network security functions, implemented as reconfigurable hardware functions, can be configured on-demand into a DPRNSS at run-time. Thus, UCoP not only supports dynamic adaptation to different environment conditions, but also increases hardware resource utilization. UCoP supports design space exploration for reconfigurable systems in three folds. Firstly, it provides reusable models of typical reconfigurable systems that can be customized according to user applications. Secondly, UCoP provides a partially reconfigurable hardware task template, using which users can focus on their hardware designs without going through the full partial reconfiguration flow. Thirdly, UCoP provides direct interactions between UML system models and real reconfigurable hardware modules, thus allowing accurate time measurements. Compared to the existing lower-bound and synthesis-based estimation methods, the accurate time measurements using UCoP at a high abstraction level can more efficiently reduce the system development efforts. 相似文献

8.

Dynamic objects: Supporting fast and easy run-time reconfiguration in FPGAs

《Journal of Systems Architecture》2013,59(1):1-15

Partial reconfiguration capabilities must be exploited to obtain the maximum benefits from dynamically reconfigurable FPGAs. Partial reconfiguration process management still faces a set of open problems that have thus far made it impossible to take full advantage of partial and dynamic reconfiguration. The work presented in this article proposes a novel architecture, development workflow, and modelling approach for dynamically reconfigurable systems management using an object model that offers a global solution. This solution is built on a system-level middleware that provides the infrastructure and tools for communication between different components in heterogeneous embedded systems. Several experiments were performed to test and evaluate each part of our proposed solution, and the obtained results are presented. These results demonstrate the excellent performance of our proposed solution. 相似文献

9.

Biometrics-based consumer applications driven by reconfigurable hardware architectures

M. FonsAuthor VitaeF. FonsAuthor Vitae E. CantóAuthor Vitae 《Future Generation Computer Systems》2012,28(1):268-286

Nowadays the development of automatic biometrics-based personal recognition systems is a reality in the current technological age. Not only those applications demanding stringent security levels but also many daily use consumer applications request the existence of high performance computational platforms in charge of recognizing the identity of an individual based on the analysis of his/her physiological or behavioural characteristics. The state of the art points out two main open problems in the implementation of such automatic applications: on the one hand, the needed improvement of the reliability level of the existing recognition systems in terms of accuracy, security and real-time performances; on the other hand, the cost reduction of those physical platforms in charge of the processing.This work addresses those limitations of current systems and aims at finding the proper system architecture to develop this kind of high-performance applications at low cost. Because of that, those existing solutions based on expensive multiprocessor systems like HPC (High Performance Computer), GPU (Graphics Processing Unit), or PC (Personal Computer) platforms need to be discarded, and instead of them embedded system solutions based on programmable logic devices are suggested in this work. The programmability performances of FPGA (Field Programmable Gate Array) devices together with the inherent parallelism of hardware design provide the needed flexibility to develop made-to-measure coprocessors in charge of accelerating those time-critical computational tasks. To address the cost of the system, dynamically reconfigurable FPGAs are suggested in this work. The scheduling of the recognition application into a series of mutually exclusive tasks, and the reutilization of those functional resources available in the FPGA by multiplexing different coprocessors in the same area along the application execution time allows reducing the size of the device and therefore its cost at the expense of the reconfiguration overhead.The hardware-software co-design of an AFAS (automatic fingerprint-based authentication system) under two different run-time reconfigurable platforms is presented as the proof of concept of the suggested architecture. The outstanding results achieved in this work pave the way for the implementation of biometric applications by means of run-time reconfigurable FPGAs. 相似文献

10.

A coarse-grain reconfigurable architecture for multimedia applications supporting subword and floating-point calculations

Claudio Brunelli Fabio Garzia Davide Rossi Jari Nurmi 《Journal of Systems Architecture》2010,56(1):38-47

Signal processors exploiting ASIC acceleration suffer from sky-rocketing manufacturing costs and long design cycles. FPGA-based systems provide a programmable alternative for exploiting computation parallelism, but the flexibility they provide is not as high as in processor-oriented architectures: HDL or C-to-HDL flows still require specific expertise and a hardware knowledge background. On the other hand, the large size of the configuration bitstream and the inherent complexity of FPGA devices make their dynamic reconfiguration not a very viable approach. Coarse-grained reconfigurable architectures (CGRAs) are an appealing solution but they pose implementation problems and tend to be application specific. This paper presents a scalable CGRA which eases the implementation of algorithms on field programmable gate array (FPGA) platforms. This design option is based on two levels of programmability: it takes advantage of performance and reliability provided by state-of-the-art FPGA technology, and at the same time it provides the user with flexibility, performance and ease of reconfiguration typical of standard CGRAs. The basic cell template provides advanced features such as sub-word SIMD integer and floating-point computation capabilities, as well as saturating arithmetic. Multiple reconfiguration contexts and partial run-time reconfiguration capabilities are provided, tackling this way the problem of high reconfiguration overhead typical of FPGAs. Selected instances of the proposed architecture have been implemented on an Altera Stratix II EP2S180 FPGA. On this system, we mapped some common DSP, image processing, 3D graphics and audio compression algorithms in order to validate our approach and to demonstrate its effectiveness by benchmarking the benefits achieved. 相似文献

11.

可重构片上系统设计流程中的动态重构问题研究

陈宇李仁发朱海袁虎《计算机研究与发展》2012,49(3):646-660

近年来,可重构片上系统已成为科学研究及嵌入式应用领域中应对复杂计算需求的有效技术解决方案.针对目前缺少一个从系统级设计到应用实现,统一、综合规划动态重构问题的系统设计流程,以及动态重构过程对系统设计人员不透明等问题,在系统设计层给出了一种过程级软硬件统一编程模型.在此框架内,设计人员通过调用已根据应用特性进行优化的软硬件协同函数,即可利用高级语言完成系统功能描述;在细节设计层提出了基于单位面积加速比的软硬件任务调度算法,实时管理动态可重构资源;在应用实现层,以可重构专用图形加速卡为原型系统,论述动态可重构系统实现中的关键技术.实验及测试结果验证了通过将动态重构问题置于整个系统设计流程中予以考虑,能够达到提升系统开发效率之目的. 相似文献

12.

动态部分重配置及其FPGA实现 总被引：2，自引：1，他引：2

下载免费PDF全文

李涛刘培峰杨愚鲁《计算机工程》2006,32(14):224-226

动态部分重配置充分利用了FPGA芯片提供的可重配置功能，提高了FPGA芯片的利用率，减小了FPGA芯片的配置时间，有效地提高了系统的整体性能。该文介绍了动态部分重配置的两种实现方法，并在Spartan-II FPGA上进行了验证。相似文献

13.

自适应运行时可重构缓存优化

下载免费PDF全文

胡森森苏加福《计算机工程与应用》2018,54(4):25-30

动态可重构缓存由于能够在运行时进行缓存容量、结构、映射规则等方面的重新配置,因而在资源利用率和能耗方面有很大优势。针对超长指令字处理器发射宽度动态变化的特点,提出了在运行时利用其动态特征来驱动缓存的重构,从而达到动态分离或合并处理器核的目的。这不同于传统的以缓存缺失率来驱动缓存重构的方法。为了平滑频繁重构场景下缓存的性能,进一步提出了一种重构时的过渡机制,使缓存平滑地从一种配置过渡到另一种配置。设计了实验并对重构策略进行了性能评估,仿真结果表明,该方法可以实现在重构后2 000周期内,缓存缺失率平均下降16%,并且提高了系统性能。相似文献

14.

Event-driven configuration of a neural network CMP system over an homogeneous interconnect fabric

M.M. Khan J. NavaridasX. Jin L.A. PlanaM. Luján S. TempleC. Patterson D. RichardsJ.V. Woods J. Miguel-AlonsoS.B. Furber 《Parallel Computing》2011,37(8):392-409

Configuring a million-core parallel system at boot time is a difficult process when the system has neither specialised hardware support for the configuration process nor a preconfigured default state that puts it in operating condition. The architecture of SpiNNaker, a parallel chip multiprocessor (CMP) system for neural network simulation, is in this class. To function as a universal neural chip, SpiNNaker uses an event-driven model with complete system virtualisation so that all components are generic and identical. Where most large CMP systems feature a sideband network to complete the boot process, SpiNNaker has a single homogeneous network interconnect for both application inter-processor communications and system control functions. This network improves fault tolerance and makes it easier to support dynamic run-time reconfiguration, however, it requires a boot process compatible with the application’s communications model. Here, we present such a boot loader, capable of bringing a generic, initially unconfigured parallel system into a working configuration. Since SpiNNaker uses event-driven asynchronous communications throughout, the loader operates with purely local control: there is no global synchronisation, state information, or transition sequence. A novel two-stage “unfolding” boot-up process efficiently configures the SpiNNaker hardware and loads the application using a high-speed flood-fill technique with support for run-time reconfiguration. SystemC simulation of a multi-CMP SpiNNaker system indicates an error-free CMP configuration time of ∼1.37 ms, while a high-level simulation of a full-scale system (64 K CMPs) indicates a mean application-loading time of ∼20 ms (for a 100 KB application), which is virtually independent of the size of the system. Further hardware-level Verilog simulation verified the cycle-accurate functionality of CMP configuration. The complete process illustrates a useful method for configuring large-scale event-driven parallel systems without having to provide dedicated hardware boot support or rely on system state assumptions. 相似文献

15.

Self-organization of reconfigurable protocol stack for networked control systems

Chun-Jie Zhou Hui Chen Yuan-Qing Qin Yu-Feng Shi Guang-Can Yu 《国际自动化与计算杂志》2011,8(2):221-235

In networked control systems (NCS), the control performance depends on not only the control algorithm but also the communication protocol stack. The performance degradation introduced by the heterogeneous and dynamic communication environment has intensified the need for the reconfigurable protocol stack. In this paper, a novel architecture for the reconfigurable protocol stack is proposed, which is a unified specification of the protocol components and service interfaces supporting both static and dynamic reconfiguration for existing industrial communication standards. Within the architecture, a triple-level self-organization structure is designed to manage the dynamic reconfiguration procedure based on information exchanges inside and outside the protocol stack. Especially, the protocol stack can be self-adaptive to various environment and system requirements through the reconfiguration of working mode, routing and scheduling table. Finally, the study on the protocol of dynamic address management is conducted for the system of controller area network (CAN). The results show the efficiency of our self-organizing architecture for the implementation of a reconfigurable protocol stack. 相似文献

16.

On-line reconfiguration to enhance the routing flexibility of complex automated material handling operations

M.M. Wong C.H. Tan J.B. Zhang L.Q. Zhuang Y.Z. Zhao M. Luo 《Robotics and Computer》2007,23(3):294

Traditionally, automated material handling systems are not designed to be reconfigurable and changes to layouts and material flow directions often require significant downtime for physical modifications and reprogramming to be made. As a consequence of global trends towards mass customisation, higher mix and lower volume production, shorter product life cycle, shorter lead-time, increasing velocity of goods across supply chains, and the increasing use of industrial information technology (IT), modern automated material handling systems (AMHS) should be designed with reconfiguration features to maximise utilisation and to enhance its flexibility. This paper focuses on how on-line reconfiguration of complex AMHS can be technically achieved (from both hardware and software considerations), which enables operational routing flexibility as well as recovery routing flexibility when faults occur. Three common “design for on-line reconfiguration” areas are first proposed, namely, resource availability of the entire AMHS or its components, reversible direction of an individual or a group of conveyors, and expandable operating range (or domain) of common-aisle transporters. Design considerations and implementation techniques for a class of reconfigurable AMHS are then discussed. Finally, we illustrate how on-line reconfiguration was implemented for a large-scale complex automated warehouse with several AMHS. 相似文献

17.

Seamless switching of scalable video bitstreams for efficient streaming

Xiaoyan Sun Feng Wu Shipeng Li Wen Gao Ya-Qin Zhang 《Multimedia, IEEE Transactions on》2004,6(2):291-303

Efficient adaptation to channel bandwidth is broadly required for effective streaming video over the Internet. To address this requirement, a novel seamless switching scheme among scalable video bitstreams is proposed in this paper. It can significantly improve the performance of video streaming over a broad range of bit rates by fully taking advantage of both the high coding efficiency of nonscalable bitstreams and the flexibility of scalable bitstreams, where small channel bandwidth fluctuations are accommodated by the scalability of a single scalable bitstream, whereas large channel bandwidth fluctuations are tolerated by flexible switching between different scalable bitstreams. Two main techniques for switching between video bitstreams are proposed. Firstly, a novel coding scheme is proposed to enable drift-free switching at any frame from the current scalable bitstream to one operated at lower rates without sending any overhead bits. Secondly, a switching-frame coding scheme is proposed to greatly reduce the number of extra bits needed for switching from the current scalable bitstream to one operated at higher rates. Compared with existing approaches, such as switching between nonscalable bitstreams and streaming with a single scalable bitstream, our experimental results clearly show that the proposed scheme brings higher efficiency and more flexibility in video streaming. 相似文献

18.

Mumbo: A Rule-Based Implementation of a Run-time Program Generation Language

Bar&#x; Aktemur Sam Kamin 《Electronic Notes in Theoretical Computer Science》2006,147(1):31

We describe our efforts to use rule-based programming to produce a model of Jumbo, a run-time program generation (RTPG) system for Java. Jumbo incorporates RTPG following the simple principle that the regular compiler — or, rather, its back-end — can be used both for ordinary, static compilation and for run-time compilation. This tends to produce a run-time compiler that is inefficient but potentially subject to improvement by partial evaluation. However, the complexity of the language and compiler have made it difficult for us to achieve actual optimization. The model, written in Maude, preserves all the essential ingredients of Jumbo, but operates on a simplified language, called Mumbo. The simplification in the language together with Maude's support for code rewriting has allowed us to make rapid progress. We discuss the model in detail, the kinds of optimizations we have obtained, and the impact on the Jumbo project. 相似文献

19.

基于遗传算法的可重构系统软硬件划分 总被引：3，自引：0，他引：3

下载免费PDF全文

李涛杨愚鲁马平柴欣《计算机工程与应用》2007,43(26):56-58

在考虑动态部分重构及重构延时等特征的基础上,采用遗传算法及其与爬山算法的融合实现可重构系统软硬件任务的划分,并采用动态优先级调度算法进行划分结果的评价。实验表明,在可重构系统的资源约束等条件下,算法能够有效地实现应用任务图到可重构系统的时空映射。相似文献

20.

硬件内部进化原理及其实现

游霞王友仁周波《计算机测量与控制》2006,14(1):120-122

研究了如何实现硬件的内部进化;讨论了实现硬件内部进化的3个条件：物质基础RCl000板卡、进化算法HereBoy和实时重构接口JBits;给出了硬件内部进化的具体流程,其实质是采用JBits对RC1000板卡上的FPGA进行实时部分重构;实例证明基于JBitsAPI、RCl000板卡和遗传算法实现硬件内部进化是可行的;对不同编码方法及不同进化资源条件下,收敛速度加以比较,结果表明：采用多参数级联编码法,协同进化LUT及其连线,显著改善了收敛速度。相似文献