期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

董亚卓左艳辉刘明政窦勇《计算机工程与科学》2008,30(9):147-150

中间表示氓是构建编译和高级综合工具的基础。本文设计了一种面向可重构硬件的编译中间表示方法。这一工作是我们设计的高级综合工具的一部分。实验结果表明,应用这一中间表示,可以将C源程序高效地映射到目标可重构硬件上。相似文献

2.

Application design for configurable computing

Mangione-Smith W.H. 《Computer》1997,30(10):115-117

Configurable computing systems enhance traditional computing systems through the addition of programmable hardware. Configurable computing offers the opportunity to change the partition at run-time by re-programming the hardware. Recent research has shifted to CAD and application development tools. Almost all existing configurable computing systems are based on field-programmable gate arrays (FPGAs). These devices implement reasonably arbitrary digital circuits, and the flexibility allows us to think of configurable computing systems based on FPGAs as netlist computers. The configurable computing approach integrates FPGAs as an intimate and fundamental component of the computing system, rather than relegating them to their earlier role of supporting system prototyping and low-volume production. However, the author believes that automated approaches to the design of configurable computing systems are premature because they do not pay enough attention to performance 相似文献

3.

Efficient datapath merging for the overhead reduction of run-time reconfigurable systems

Mahmood Fazlali Ali Zakerolhosseini Georgi Gaydadjiev 《The Journal of supercomputing》2012,59(2):636-657

High latencies in FPGA reconfiguration are known as a major overhead in run-time reconfigurable systems. This overhead can be reduced by merging multiple data flow graphs representing different kernels of the original program into a single (merged) datapath that will be configured less often compared to the separate datapaths scenario. However, the additional hardware introduced by this technique increases the kernels execution time. In this paper, we present a novel datapath merging technique that reduces both the configuration and execution times of kernels mapped on the reconfigurable fabric. Experimental results show up to 13% reduction in the configuration and execution times of kernels from media-bench workloads, compared to previous art on datapath merging. When compared to conventional high-level synthesis algorithms, our proposal reduces kernels configuration and execution times by up to 48%. 相似文献

4.

Seeking solutions in configurable computing

《Computer》1997,30(12):38-43

Configurable computing offers the potential of producing powerful new computing systems. Will current research overcome the dearth of commercial applicability to make such systems a reality? Unfortunately, no system to date has yet proven attractive or competitive enough to establish a commercial presence. We believe that ample opportunity exists for work in a broad range of areas. In particular, the configurable computing community should focus on refining the emerging architectures, producing more effective software/hardware APIs, better tools for application development that incorporate the models of hardware reconfiguration, and effective benchmarking strategies 相似文献

5.

Micro-Task Processing in Heterogeneous Reconfigurable Systems

下载免费PDF全文

Sebastian Wallner 《计算机科学技术学报》2005,20(5):624-634

New reconfigurable computing architectures are introduced to overcome some of the limitations of conventional microprocessors and fine-grained reconfigurable devices (e.g., FPGAs). One of the new promising architectures are Configurable System-on-Chip (CSoC) solutions. They were designed to offer high computational performance for real-time signal processing and for a wide range of applications exhibiting high degrees of parallelism. The programming of such systems is an inherently challenging problem due to the lack of an programming model. This paper describes a novel heterogeneous system architecture for signal processing and data streaming applications. It offers high computational performance and a high degree of flexibility and adaptability by employing a micro Task Controller (mTC) unit in conjunction with programmable and configurable hardware. The hierarchically organized architecture provides a programming model, allows an efficient mapping of applications and is shown to be easy scalable to future VLSI technologies. Several mappings of commonly used digital signal processing algorithms for future telecommunication and multimedia systems and implementation results are given for a standard-cell ASIC design realization in 0.18 micron 6-layer UMC CMOS technology. 相似文献

6.

Guest Editors' Introduction: Advances in Configurable Computing

Lysaght P. Subrahmanyam P.A. 《Design & Test of Computers, IEEE》2005,22(2):85-89

At times, it appears that the many definitions of configurable computing are every bit as configurable as the technology itself. For example, Wikipedia-the free, online, user-editable encyclopedia-defines configurable computing (or, synonymously, reconfigurable computing) as ".... computer processing with highly flexible computing fabrics. The principal difference when compared to using ordinary microprocessors is the ability to make substantial changes to the data path itself in addition to the control flow' (http://en.wikipedia.org/wiki/Configurable_computing). 相似文献

7.

Parallel LU factorization of sparse matrices on FPGA‐based configurable computing engines

Xiaofang Wang Sotirios G. Ziavras 《Concurrency and Computation》2004,16(4):319-343

Configurable computing, where hardware resources are configured appropriately to match specific hardware designs, has recently demonstrated its ability to significantly improve performance for a wide range of computation‐intensive applications. With steady advances in silicon technology, as predicted by Moore's Law, Field‐Programmable Gate Array (FPGA) technologies have enabled the implementation of System‐on‐a‐Programmable‐Chip (SOPC or SOC) computing platforms, which, in turn, have given a significant boost to the field of configurable computing. It is possible to implement various specialized parallel machines in a single silicon chip. In this paper, we describe our design and implementation of a parallel machine on an SOPC development board, using multiple instances of a soft IP configurable processor; we use this machine for LU factorization. LU factorization is widely used in engineering and science to solve efficiently large systems of linear equations. Our implementation facilitates the efficient solution of linear equations at a cost much lower than that of supercomputers and networks of workstations. The intricacies of our FPGA‐based design are presented along with tradeoff choices made for the purpose of illustration. Performance results prove the viability of our approach. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

8.

面向应用的可重构编译器ASCRA(英文) 总被引：1，自引：0，他引：1

下载免费PDF全文

吴艳霞顾国昌孙延腾杨敏杨杰牛晓霞孙霖《计算机科学与探索》2011,5(3):267-279

在很多应用领域已经开展了可重构计算的研究,但是由于缺乏高层设计工具,设计者需要较深的软件和硬件专业知识才能开发GPP/RAU架构的程序,阻碍了其大规模应用。提出了一种面向应用的可重构编译器——ASCRA的初始架构,它可以自动将C语言映射为VHDL语言,从而解决可重构计算中自动编译工具的瓶颈。ASCRA编译器主要研究软硬件划分技术和面向硬件的优化技术,如脉动阵列、循环流水技术。在ML505开发平台上,设计实现了ASCRA编译器的验证平台,并通过实验给出了核心程序段生成VHDL代码的综合信息。相似文献

9.

Partitioning Methodology for Heterogeneous Reconfigurable Functional Units 总被引：1，自引：0，他引：1

Michalis D. Galanis Gregory Dimitroulakos Costas E. Goutis 《The Journal of supercomputing》2006,38(1):17-34

A partitioning methodology between the reconfigurable hardware blocks of different granularity, which are embedded in a generic heterogeneous architecture, is presented. The fine-grain reconfigurable logic is realized by an FPGA unit, while the coarse-grain reconfigurable hardware by a 2-Dimensional Array of Processing Elements. Critical parts, called kernels, are mapped on the coarse-grain reconfigurable logic for improving performance. The partitioning method is mainly composed by three steps: the analysis of the input code, the mapping onto the Coarse-Grain Reconfigurable Array and the mapping onto the FPGA. The partitioning flow is implemented by a prototype software framework. Analytical partitioning experiments, using five real-world applications, show that the execution time speedup relative to an all-FPGA solution ranges from 1.4 to 5.0. 相似文献

10.

基于Impulse—C的可重构编程技术研究

毛兴权《数字社区&智能家居》2009,5(2):991-993

可重构计算的研究使用高度灵活的计算结构进行高性能计算。近年来采用FPGA器件来创建可重计算平台的研究大量出现。基于高级语言的FPGA编程技术可以让软件工程师摆脱硬件的干扰,致力于算法的实现。Impulse C语言工具集就是一种对软硬件划分和软硬件过程协同设计的相对简单的、基于C语言的方法,它与高效的基于FPGA的硬件编译器相结合,形成了一种完整的混合处理器和FPGA实现的方法。这些工具极大地简化了可重构部件的设计过程,但是在高效性和电路优化等方面跟手工设计仍有差距。相似文献

11.

面向可重构计算系统的模块映射算法

下载免费PDF全文

刘杰吴强赵全伟《计算机工程》2012,38(3):276-279,283

为消除重构时间对可重构计算系统性能的影响,针对多重构模块,提出一种基于动态部分可重构技术的顺序型应用程序模块映射算法。利用动态可重构技术的高效性和灵活性,通过隐藏重构时间,达到减少程序执行时间和提高系统性能的目的。基于JPEG编码测试实例的实验结果表明,运用该算法实现的模块映射方案其程序执行速度是软件实现方式的3.31倍,是硬件方式的2.59倍。相似文献

12.

Design and implementation of high-speed buffered crossbars with efficient load balancing for multi-core SoCs

George Kornaros Theofanis Orphanoudakis 《Microprocessors and Microsystems》2010,34(7-8):301-315

A large increase of the number of devices integrated in a single chip in conjunction with the significant demands of modern applications for performance has led the designers to a system development methodology based on integrating multiple pre-verified intellectual property cores. Yet, design productivity requirements push designers to focus on key micro-architectural solutions to manage more efficiently the scaling of multi-core SoCs as well as to increase the degree of design automation, particularly as rapid prototyping using reconfigurable computing is becoming mainstream. In this paper we present a novel interconnect architecture based on optimized components to efficiently manage SoCs that follow either a multi-core based approach or are built to support SIMD-style applications that can exploit the processing power of a pool of hardware resources; first we analyze the design of a crossbar featuring shared-memory combined input-crosspoint buffering as a solution for efficient implementation of on-chip interconnection; second we describe the design of a load-balancer featuring configurable proportional allocation of on-chip resources and in-order delivery as a solution for efficient scheduling and execution of processing tasks. The main focus of the paper is to describe and evaluate the mechanisms designed to distribute and manage data transfers so as to implement an efficient interconnection of the integrated cores and control access to available (either on-chip or off-chip) resources for the implementation of a number of embedded systems and applications. Each of these challenges is handled by the proposed architecture in an efficient way in terms of performance, cost in silicon and flexibility. 相似文献

13.

编译群体智能系统应用程序:以全分布式智能建筑系统为例

陈文杰杨启亮姜子炎邢建春周启臻邹荣伟冯博伟《软件学报》2024,35(6)

群体智能系统通过邻居个体的信息交互实现群体级别的应用任务,具有良好的鲁棒性和灵活性.与此同时,大多数开发人员难以对分布式、并行的个体交互机制进行描述.一些高级语言允许用户以串行思维方式、从系统全局角度来编程并行的群体智能计算任务,而无需考虑通信协议、数据分布等底层交互细节.但面向用户、全局声明式的群体智能系统应用程序与个体并行执行逻辑存在的巨大语义差距,使得编译过程复杂进而导致应用程序开发效率不高.本文提出了一个编译系统及其支撑工具,支持将高级的群体智能系统应用程序转换为安全、高效的分布式实现.该编译系统通过并行信息识别,计算划分,交互信息生成技术,将面向系统全局、串行编程的群体智能应用程序编译为面向个体独立执行的并行目标代码,从而使用户不必了解个体间的复杂交互机制.设计了一种标准化中间表示,将复杂群体智能计算任务转换为群体智能算子和输入输出变量组合而成的标准化语义模块序列,其以独立于平台的形式表示源程序信息,屏蔽了目标硬件平台的异构性.在一个群体智能系统案例平台中部署和测试了该编译系统,结果表明该系统能够有效将群体智能应用程序编译为平台可执行的目标代码并提升应用程序开发效率,其生成的代码在一系列基准测试中具有比现有编译器更好的性能. 相似文献

14.

Virtualization of reconfigurable coprocessors in HPRC systems with multicore architecture

Ivan Gonzalez Sergio Lopez-Buedo Gustavo Sutter Diego Sanchez-Roman Francisco J. Gomez-Arribas Javier Aracil 《Journal of Systems Architecture》2012,58(6-7):247-256

HPRC (High-Performance Reconfigurable Computing) systems include multicore processors and reconfigurable devices acting as custom coprocessors. Due to economic constraints, the number of reconfigurable devices is usually smaller than the number of processor cores, thus preventing that a 1:1 mapping between cores and coprocessors could be achieved. This paper presents a solution to this problem, based on the virtualization of reconfigurable coprocessors. A Virtual Coprocessor Monitor (VCM) has been devised for the XtremeData XD2000i In-Socket Accelerator, and a thread-safe API is available for user applications to communicate with the VCM. Two reference applications, an IDEA cipher and an Euler CFD solver, have been implemented in order to validate the proposed architecture and execution model. Results show that the benefits arising from coprocessor virtualization outperform its overhead, specially when code has a significant software weight. 相似文献

15.

C2FPGA—A dependency-timing graph design methodology

Sunita Chandrasekaran Shilpa Shanbagh Ramkumar Jayaraman Douglas L. Maskell Hui Yan Cheah 《Journal of Parallel and Distributed Computing》2013

In this paper, we present a design methodology that uses a combined graphical and scheduling technique to map C-based high level language (HLL) based applications to FPGA. Although there are a number of approaches addressing the mapping from HLL to hardware, many of these existing solutions either require a steep learning curve or do not produce an appropriate mapping pattern for the hardware platform. We provide a solution to this problem, by analyzing the data flow and data dependencies in the given code and proposing a scheduling patterns for the given algorithm. We then provide a suitable mapping pattern for the hardware platform. We use the mapping pattern to deliver synthesizable HDL (Verilog) code. We demonstrate our design methodology with results from different real-time case studies that are based on different algorithms. 相似文献

16.

Using design space exploration for finding schedules with guaranteed reaction times of synchronous programs on multi-core architecture

《Journal of Systems Architecture》2017

The synchronous model of computation is well suited for real-time systems, because it allows static analysis in order to find and guarantee their reaction times. Today’s multi-core systems are becoming the predominant computing platforms. Synchronous programs are typically compiled into single threaded code, which makes them unsuitable for exploiting parallelism of the multi-core platforms. Moreover, static timing analysis becomes highly intractable for multi-core systems. This article proposes a novel methodology that aims at finding the mapping and schedule of synchronous programs that guarantees, statically, reaction times when mapped onto a multi-core system consisting of two types of time-predictable cores. The proposed methodology combines design space exploration based on evolutionary algorithm and scheduling of parts of synchronous programs. It allows minimizing the resource usage in terms of number of cores by finding the mapping and schedule with the guaranteed reaction time for architectures with different number of cores. In particular, we: (a) transform a synchronous program written in synchronous SystemJ to a graph-based model represented with two types of computation nodes suitable for execution on two types of time-predictable cores, (b) perform mapping of computation nodes on a customizable multi-core platform using genetic operations, and (c) generate a resulting static schedule of computation nodes for each mapping as part of the design space exploration. The design flow, from program specification and node mapping to the design space exploration and multi-core scheduling is completely automated. 相似文献

17.

一种编译时估计基本块在可重构器件上运行时间的方法

沈英哲周学海《小型微型计算机系统》2007,28(8):1496-1501

为在编译过程中快速、准确地计算一段程序直接采用硬件方式实现的执行时间,给软硬件代码划分提供更具指导性的信息.本文提出基于IP核的代码转换机制和一种通用可重构器件描述方法,以及在编译时基于二者计算一个基本块在可重构硬件上运行时间的方法.实验结果表明,虽然为减少编译过程中硬件综合以及布局布线的工作量而采用一些近似和假设,但同Xilinx ISE中Time Analyzer工具的分析结果相比,文中方法的平均计算误差为2.9%,最大误差为8.2%. 相似文献

18.

面向可重构系统的eCos拓展

张亮忠熊选东付建丹王松锋《计算机应用研究》2012,29(5):1768-1771

针对可重构系统中任务模型灵活性差、硬件任务重构延时长、FPGA资源利用率低等问题,提出了将应用程序划分为软件任务和混合任务的划分模式,并在eCos的基础上,通过重构控制机制、混合任务管理机制、通信机制三方面的拓展,设计了支持可重构系统的嵌入式操作系统框架eCos4RC。仿真结果表明,eCos4RC实现了对混合任务的有效管理,在兼容eCos多线程机制的同时提高了应用程序执行速度和可重构资源利用率,为可重构计算平台提供了良好的运行环境支持。相似文献

19.

Trident: From High-Level Language to Hardware Circuitry

Tripp J.L. Gokhale M.B. Peterson K.D. 《Computer》2007,40(3):28-37

相似文献

20.

Enabling fuzzy technologies in high performance networking via an open FPGA-based development platform

Federico Montesino Pouzols Angel Barriga Barros 《Applied Soft Computing》2012,12(4):1440-1450

Soft computing techniques and particularly fuzzy inference systems are gaining momentum as tools for network traffic modeling, analysis and control. Efficient hardware implementations of these techniques that can achieve real-time operation in high-speed networking equipment as well as other highly time-constrained application fields is however an open problem. We introduce a development platform for fuzzy inference systems with applications to network traffic analysis and control. The platform addresses the current requirements and constraints of high performance networking equipment. For the development process, we set up a methodology and a CAD tool chain that span the entire design process from initial specification in a high-level language to implementation on FPGA devices. An FPGA development board with PCI/PCIe interface is employed to support an open platform that comprises CAD tools as well as IP cores. PCI compatible fuzzy inference modules are implemented as System-on-Programmable-Chip (SoPC). We present satisfactory experimental results from the implementation of fuzzy systems for a number of applications in analysis and control of Internet traffic. These systems are shown to satisfy operational and architectural requirements of current and future high performance routing equipment. The platform proposed allows for the development of prototypes while avoiding large investments and complicated management procedures which constrain the testing and adoption of soft computing techniques in high performance networking. 相似文献