首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we propose a method for speeding-up Digital Signal Processing applications by partitioning them between the reconfigurable hardware blocks of different granularity and mapping critical parts of applications on coarse-grain reconfigurable hardware. The reconfigurable hardware blocks are embedded in a heterogeneous reconfigurable system architecture. The fine-grain part is implemented by an embedded FPGA unit, while for the coarse-grain reconfigurable hardware our developed high-performance coarse-grain data-path is used. The design flow mainly consists of three steps; the analysis procedure, the mapping onto coarse-grain blocks, and the mapping onto the fine-grain hardware. In this work, the methodology is validated using five real-life applications; an OFDM transmitter, a medical imaging technique, a wavelet-based image compressor, a video compression scheme and a JPEG encoder. The experimental results show that the speedup, relative to an all-FPGA solution, ranges from 1.55 to 4.17 for the considered applications.  相似文献   

2.
Scientific application kernels mapped to reconfigurable hardware have been reported to have 10times to 100times speedup over equivalent software. These promising results suggest that reconfigurable logic might offer significant speedup on applications in science and engineering. To accurately assess the benefit of hardware acceleration on scientific applications, however, it is necessary to consider the entire application including software components as well as the accelerated kernels. Aspects to be considered include alternative methods of hardware/software partitioning, communications costs, and opportunities for concurrent computation between software and hardware. Analysis of these factors is beyond the scope of current automatic parallelizing compilers. In this paper, a case study is presented in which a simulation of metropolitan road traffic networks is mapped onto a reconfigurable supercomputer, the Cray XD1. Five different methods are presented for mapping the application onto the combined hardware/software system. An approach for approximating the performance of each method is derived through analytic equations. Our results, both analytically and empirically, show that key predictors of performance (which are often not considered in reported speedup of kernel operations) are not necessarily maximum parallelism, but must account for the fraction of the problem that runs on the reconfigurable logic and the amount data flow between software and hardware.  相似文献   

3.
In this paper, we propose a configuration-aware data-partitioning approach for reconfigurable computing. We show how the reconfiguration overhead impacts the data-partitioning process. Moreover, we explore the system-level power-performance tradeoffs available when implementing streaming embedded applications on fine-grained reconfigurable architectures. For a certain group of streaming applications, we show that an efficient hardware/software partitioning algorithm is required when targeting low power. However, if the application objective is performance, then we propose the use of dynamically reconfigurable architectures. We propose a design methodology that adapts the architecture and algorithms to the application requirements. The methodology has been proven to work on a real research platform based on Xilinx devices. Finally, we have applied our methodology and algorithms to the case study of image sharpening, which is required nowadays in digital cameras and mobile phones.  相似文献   

4.
Heterogeneous reconfigurable systems provide drastically higher performance and lower power consumption than traditional CPU-centric systems. Moreover, they do it at much lower costs and shorter times to market than non-reconfigurable hardware solutions. They also provide the flexibility that is often required for the engineering of modern robust and adaptive systems. Due to their heterogeneity, flexibility and potential for highly optimized application-specific instantiation, reconfigurable systems are adequate for a very broad class of applications across different industry sectors. What prevents the reconfigurable system paradigm from a broad proliferation is the lack of adequate development methodologies and electronics design tools for this kind of systems. The ideal would be a seamless compilation of a high-level computation process specification into an optimized mixture of machine code executed on traditional CPU-centric processors and on the application-specific decentralized parallel data-flow-dominated reconfigurable processors and hardware accelerators. Although much research and development in this direction was recently performed, the adequate methodologies and tools necessary to implement this compilation process as an effective and efficient hardware/software co-synthesis flow are unfortunately not yet in place. This paper focuses on the recent developments and development trends in the design methods and synthesis tools for reconfigurable systems. Reconfigurable system synthesis performs two basic tasks: system structure construction and application process mapping on the structure. It is thus more complex than standard (multi-)processor-based system synthesis for software-programmable systems that only involves application mapping. The system structure construction may involve the macro-architecture synthesis, the micro-architecture synthesis, and the actual hardware synthesis. Also, the application process mapping can be more complicated and dynamic in reconfigurable systems. This paper reviews the recent methods and tools for the macro- and micro-architecture synthesis, and for the application mapping of reconfigurable systems. It puts much attention to the relevant and currently hot topic of (re-)configurable application-specific instruction set processors (ASIP) synthesis, and specifically, ASIP instruction set extension. It also discusses the methods and tools for reconfigurable systems involving CPU-centric processors collaborating with reconfigurable hardware sub-systems, for which the main problem is to decide which computation processes should be implemented in software and which in hardware, but the hardware/software partitioning has to account for the hardware sharing by different computation processes and for the reconfiguration processes. The reconfigurable system area is a very promising, but quite a new field, with many open research and development topics. The paper reviews some of the future trends in the reconfigurable system development methods and tools. Finally, the discussion of the paper is summarized and concluded.  相似文献   

5.
Reconfigurable Computing for Digital Signal Processing: A Survey   总被引:6,自引:0,他引:6  
Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of choice for many DSP applications, increasingly new system implementations based on reconfigurable computing are being considered. These flexible platforms, which offer the functional efficiency of hardware and the programmability of software, are quickly maturing as the logic capacity of programmable devices follows Moore's Law and advanced automated design techniques become available. As initial reconfigurable technologies have emerged, new academic and commercial efforts have been initiated to support power optimization, cost reduction, and enhanced run-time performance.This paper presents a survey of academic research and commercial development in reconfigurable computing for DSP systems over the past fifteen years. This work is placed in the context of other available DSP implementation media including ASICs and PDSPs to fully document the range of design choices available to system engineers. It is shown that while contemporary reconfigurable computing can be applied to a variety of DSP applications including video, audio, speech, and control, much work remains to realize its full potential. While individual implementations of PDSP, ASIC, and reconfigurable resources each offer distinct advantages, it is likely that integrated combinations of these technologies will provide more complete solutions.  相似文献   

6.
Day after day, embedded systems add more compute-intensive applications inside their end products: cryptography or image and video processing are some examples found in leading markets like consumer electronics and automotive. To face up these ever-increasing computational demands, the use of hardware accelerators synthesized in field-programmable gate arrays (FPGA) lets achieve processing speedups of orders of magnitude versus their counterpart CPU-based software approaches. However, the inherent increment in physical resources penalizes in cost. To address this issue, dynamically reconfigurable hardware technology definitively reached its maturity. SRAM-based reconfigurable logic goes beyond the classical conception of static hardware resources distributed in space and held invariant for the entire application life cycle; it provides a new design abstraction featured by the temporal partitioning of such resources to promote their continuous reuse, reconfiguring them on the fly to play a different role in each instant. This new computing paradigm lets balance the design of embedded applications by partitioning their functionality in space and time—through a series of mutually-exclusive processing tasks synthesized multiplexed in time on the same set of resources—and achieving thus cost savings in both area and power metrics. However, the exploitation of this system versatility requires special attention to avoid performance degradation. Such technical aspects are addressed in this work intended to be a survey on reconfigurable hardware technology and aimed at defining an open, standard and cost-effective system architecture driven by flexible coprocessors instantiated on demand on reconfigurable resources of an FPGA. This concept fits well with the functional features demanded to many embedded applications today and its feasibility has been proved with a state-of-the-art commercial SRAM-based FPGA platform. The achieved results highlight dynamic partial reconfiguration as a potential technology to lead the next computing wave in the industry.  相似文献   

7.
This paper revises and introduces to the field of reconfigurable computer systems, some traditional techniques used in the fields of fault-tolerance and testing of digital circuits. The target area is that of on-board spacecraft electronics, as this class of application is a good candidate for the use of reconfigurable computing technology. Fault tolerant strategies are used in order for the system to adapt itself to the severe conditions found in space. In addition, the paper describes some problems and possible solutions for the use of reconfigurable components, based on programmable logic, in space applications.  相似文献   

8.
The main focus of this article is the design of embedded signal processing (ESP) application software. We identify the characteristics of such applications in terms of their computational requirements, data layouts, and latency and throughput constraints. We describe an ESP application, an adaptive sonar beamformer. Then, we briefly survey the state-of-the-art in high performance computing (HPC) technology and address the advantages and challenges in using HPC technology for implementing ESP applications. To describe the software design issues in this context, we define a task model to capture the features of ESP applications. This model specifies the independent activities in each processing stage. We also identify various optimization problems in parallelizing ESP applications. We address the key issues in developing scalable and portable algorithms for ESP applications. We focus on the algorithmic issues in exploiting coarse-grain parallelism. These issues include data layout design and task mapping. We show a task mapping methodology for application software development based on our execution model (Lee et al., 1998). This uses a novel stage partitioning technique to exploit the independent activities in a processing stage. We use our methodology to maximize the throughput of an ESP application for a given platform size. The resulting application software using this methodology is called a software task pipeline. An adaptive sonar beamformer has been implemented using this design methodology  相似文献   

9.
文章提出了一种业务逻辑可重构Web应用的架构模型。首先,将业务逻辑抽象为动作序列,并描述成脚本的形式。其次设计并实现了以环境对象(Env)为数据总线的系统架构,以适应脚本的执行需求,从实践上验证了业务逻辑重构性。  相似文献   

10.
Coarse-grained reconfigurable arrays (CGRAs) have shown potential for application in embedded systems in recent years. Numerous reconfigurable processing elements (PEs) in CGRAs provide flexibility while maintaining high performance by exploring different levels of parallelism. However, a difference remains between the CGRA and the application-specific integrated circuit (ASIC). Some application domains, such as software-defined radios (SDRs), require flexibility with performance demand increases. More effective CGRA architectures are expected to be developed. Customisation of a CGRA according to its application can improve performance and efficiency. This study proposes an application-specific CGRA architecture template composed of generic PEs (GPEs) and special PEs (SPEs). The hardware of the SPE can be customised to accelerate specific computational patterns. An automatic design methodology that includes pattern identification and application-specific function unit generation is also presented. A mapping algorithm based on ant colony optimisation is provided. Experimental results on the SDR target domain show that compared with other ordinary and application-specific reconfigurable architectures, the CGRA generated by the proposed method performs more efficiently for given applications.  相似文献   

11.
This paper presents performance improvements and energy savings from mapping real-world benchmarks on an embedded single-chip platform that includes coarse-grained reconfigurable logic with a microprocessor. The reconfigurable hardware is a 2-D array of processing elements connected with a mesh-like network. Analytical results derived from mapping seven real-life digital signal processing applications, with the aid of an automated design flow, on six different instances of the system architecture are presented. Significant overall application speedups relative to an all-software solution, ranging from 1.81 to 3.99 are reported being close to theoretical speedup bounds. Additionally, the energy savings range from 43% to 71%. Finally, a comparison with a system coupling a microprocessor with a very long instruction word core shows that the microprocessor/coarse-grained reconfigurable array platform is more efficient in terms of performance and energy consumption.  相似文献   

12.
Reconfigurable hybrid processor systems provide a flexible platform for mapping data-parallel applications, while providing considerable speedup over software implementations. However, the overhead for reconfiguration presents a significant deterrent in mapping applications onto reconfigurable hardware. Partial runtime reconfiguration is one approach to reduce the reconfiguration overhead. In this paper, we present a methodology to map data-parallel tasks onto hardware that supports partial reconfiguration. The aim is to obtain the maximum possible speedup, for a given reconfiguration time, bus speed, and computation speed. The proposed approach involves using multiple, identical but independent processing units in the reconfigurable hardware. Under nonzero reconfiguration overhead, we show that there exists an upper limit on the number of processing units that can be employed beyond which further reduction in execution time is not possible. We obtain solutions for the minimum processing time, the corresponding load distribution, and schedule for data transfer. To demonstrate the applicability of the analysis, we present the following: 1) various plots showing the variation of processing time with different parameters; 2) hardware simulations for two examples, viz., 1-D discrete wavelet transform and finite impulse response filter, targeted to Xilinx field-programmable gate arrays (FPGAs); and 3) experimental results for a hardware prototype implemented on a FPGA board  相似文献   

13.
Dynamic and Partial FPGA Exploitation   总被引:1,自引:0,他引:1  
Today's field programmable gate array (FPGA) architectures, like Xilinx's Virtex-II series, enable partial and dynamic run-time self-reconfiguration. This feature allows the substitution of parts of a hardware design implemented on this reconfigurable hardware, and therefore, a system can be adapted to the actual demands of applications running on the chip. Exploiting this possibility enables the development of adaptive hardware for a huge variety of applications. A novel method for communication interfaces using look up table (LUT)-based communication primitives enables an exact separation of reconfigurable parts and a fast and intelligent bus-system. A new adaptive software/hardware reconfigurable system is presented in this paper, using a real application in the automotive domain implemented on a Xilinx Virtex-II 3000 FPGA to present results  相似文献   

14.
基于可重构核的FPGA电路设计   总被引:4,自引:0,他引:4  
电路系统的自适应性、紧凑性和低成本 ,促进了在嵌入式系统中软硬件的协同设计。在线可重构FPGA不仅可以满足这一要求 ,而且在可编程专用电路系统设计的验证及可靠性等方面有着良好的应用 ,文中介绍了可重构 FPGA的实现结构及评估方法 ,提出以线性矢量表征可重构 FPGA及其可重构核的研究模型 ,以及基于可重构核的模块化设计 ,认为面向分类的专用类可重构 FPGA应当是现阶段可重构 FPGA的研究主题。  相似文献   

15.
Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. By mapping the compute-intensive sections of an application to reconfigurable hardware, custom computing systems exhibit significant speedups over traditional microprocessors. However, this potential acceleration is limited by the requirement that the speedups provided must outweigh the considerable cost of reconfiguration. The ability to relocate and defragment configurations on field programmable gate arrays (FPGAs) can dramatically decrease the overall reconfiguration overhead incurred by the use of the reconfigurable hardware. We therefore present hardware solutions to provide relocation and defragmentation support with a negligible area increase over a generic partially reconfigurable FPGA, as well as software algorithms for controlling this hardware. This results in factors of 8 to 12 improvement in the configuration overheads displayed by traditional serially programmed FPGAs.  相似文献   

16.
用于二维RCA跨层数据传输的旁节点无冗余添加算法   总被引:1,自引:0,他引:1  
针对二维可重构单元阵列(RCA)硬件任务的跨层数据传输问题,提出了一种前序遍历回溯旁节点添加算法。该算法针对跨层输入树、跨层输出树2种类型的数据流图,保持了原有运算节点之间的逻辑关系,实现了旁节点的无冗余添加。给出了动态可重构系统划分映射的量化评估指标体系和流水化模型,给出了添加旁节点映射的临界条件。实验结果表明,基于相同的系统结构和划分映射算法,在满足临界条件的情况下,与不加旁节点映射算法相比,加旁节点映射在划分模块数,非原始输入输出次数、配置时间、总执行周期、功耗等方面均获得了较好的改进;与已有的先进算法相比,文中算法平均执行总周期降低了23.3%(RCA5×5)和30.5%(RCA8×8),平均消耗功耗降低了15.7%(RCA5×5)和18.6%(RCA8×8),从而验证了所提方法的合理性和有效性。  相似文献   

17.
针对可重构计算机系统配置次数(划分块数)的最小化问题,提出了一种融合面积估算和多目标优化的硬件任务划分算法。该算法每次划分均进行硬件资源面积的估算,并且通过充分考虑可重构资源的使用、一个数据流图所有划分块执行延迟总和、划分模块间边数等因素构造了新的探测函数prior_assigned(),该函数能够计算每个就绪节点的优先权值,新算法通过该值能动态调整就绪列表任务节点的调度次序。实验结果表明,与现有的层划分、簇划分、增强静态列表、多目标时域划分、簇层次敏感等5种划分算法相比,该算法能获得最少的模块数,并且随着可重构处理单元面积的增大,除层划分算法之外,其执行延迟的均值也是最小的。  相似文献   

18.
以最大-最小蚁群系统为基础,为蚁群采用增加了嗅觉分辨能力,应用于粗粒度可配置结构芯片的路由问题。以开发的粗粒度可重构芯片CTaiJi为对象,通过几个算例的比较,可以看到此方法找到最优解的能力优于目前常用的谈判阻塞算法。  相似文献   

19.
冯晓  李伟  戴紫彬  马超  李功丽 《电子学报》2017,45(6):1311-1320
现有的可重构分组密码实现结构中,专用指令处理器吞吐率不高,阵列结构资源利用率低、算法映射过程复杂.为此,设计了分组密码可重构异构多核并行处理架构RAMCA(Reconfigurable Asymmetrical Multi-Core Architecture),分析了典型SP(AES-128)、Feistel(SMS4)、L-M(IDEA)及MISTY(KASUMI)结构算法在RAMCA上的映射过程.在65nm CMOS工艺下完成了逻辑综合和功能仿真.实验表明,RAMCA工作频率可达到1GHz,面积约为1.13mm2,消除工艺影响后,对各分组密码算法的运算速度均高于现有专用指令处理器以及Celator、RCPA和BCORE等阵列结构密码处理系统.  相似文献   

20.
This article presents a systematic approach to hardware/software codesign targeting data-intensive applications. It focuses on the application processes that can be represented in directed acrylic graphs (DAGs) and use a synchronous dataflow (SDF) model, the popular form of dataflow employed in DSP systems when running the process. The codesign system is based on the ultrasonic reconfigurable platform, a system designed jointly at Imperial College and the SONY Broadcast Laboratory. This system is modeled as a loosely coupled structure consisting of a single instruction processor and multiple reconfigurable hardware elements. The paper also introduces and demonstrates a task-based hardware/software codesign environment specialized for real-time video applications. Both the automated partitioning and scheduling environment and the task manager program help to provide a fast robust for supporting demanding applications in the codesign system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号