首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Reconfigurable architectures that tightly integrate a standard CPU core with a field-programmable hardware structure have recently been receiving increased attention. The design of such a hybrid reconfigurable processor involves a multitude of design decisions regarding the field-programmable structure as well as its system integration with the CPU core. Determining the impact of these design decisions on the overall system performance is a challenging task. In this paper, we first present a framework for the cycle-accurate performance evaluation of hybrid reconfigurable processors on the system level. Then, we discuss a reconfigurable processor for data-streaming applications, which attaches a coarse-grained reconfigurable unit to the coprocessor interface of a standard embedded CPU core. By means of a case study we evaluate the system-level impact of certain design features for the reconfigurable unit, such as multiple contexts, register replication, and hardware context scheduling. The results illustrate that a system-level evaluation framework is of paramount importance for studying the architectural trade-offs and optimizing design parameters for reconfigurable processors.  相似文献   

2.
Two of the most important design issues for next generation handheld devices are wireless networking and the processing of multimedia content. Both applications rely heavily on computationally intensive digital signal processing algorithms. Programmable architectures that keep pace with the increasing performance requirements become more and more power hungry. This is problematic for a battery powered mobile device, since it has only a limited amount of energy available. Conversely, dedicated architectures are too inflexible to keep pace with changing standards and feature sets. A mobile device requires high-performance, flexibility and (energy-)efficiency. These contradicting requirements need to be balanced in the system architecture of a mobile device. In this paper a heterogeneous architecture of domain specific processing tiles is proposed. The focal point is the coarse-grained reconfigurable architecture of the Montium processing tile, which is designed to execute digital signal processing algorithms energy-efficiently.  相似文献   

3.
In this paper, we present an approach to the problem of low energy data scheduling for reconfigurable architectures targeting digital signal processing (DSP) and multimedia applications. The main goal is the reduction of the energy consumed by these applications through the integration of the proposed data management framework within a compilation tool specifically conceived for these architectures. Two levels of on-chip data storage are assumed to be available in the reconfigurable architecture. Then, the data manager tries to optimally exploit this storage hierarchy by saving data transfers among on-chip and external memories, so reducing the energy consumption. To do that, specific algorithms for finding the data shared among the different computation kernels of the application have been developed. Also, a data placement and replacement policy has been designed. We also show how an adequate data scheduling could decrease the number of operations required to implement the dynamic reconfiguration of the system.  相似文献   

4.
Recent research in reduced instruction set computer architectures has emphasized the importance of the empirical approach to designing computer architectures: architectural features are analyzed for utility and cost with respect to the system software that uses them. This approach has resulted in architectural simulators that allow computer designers to vary the features of the architecture being simulated and to analyze how the addition or removal of these features affects the cost and performance of the architecture. In this paper we apply this technique to a new area: reconfigurable architectures. Our approach is to use an empirical methodology that emphasizes the interaction between the target software and the reconfigurability features of parallel architectures. We have developed a set of tools, the reconfigurable architecture workbench, that assists in this methodology by allowing parallel programs to be simulated on a target architecture in order to study the performance implications of various reconfigurability features. The workbench is based on a framework, the PCI model, which describes the range of parallel programs, parallel architectures, and reconfiguration features. We present details of the design and implementation of a prototype workbench, GT-RAW. GT-RAW is being used to study the utility of one dimension of reconfiguration for image processing and image understanding applications. We present an example of the experiments that are being conducted with GT-RAW as a demonstration of our empirical methodology.  相似文献   

5.
New standards in signal, multimedia, and network processing for embedded electronics are characterized by computationally intensive algorithms, high flexibility due to the swift change in specifications. In order to meet demanding challenges of increasing computational requirements and stringent constraints on area and power consumption in fields of embedded engineering, there is a gradual trend towards coarse-grained parallel embedded processors. Furthermore, such processors are enabled with dynamic reconfiguration features for supporting time- and space-multiplexed execution of the algorithms. However, the formidable problem in efficient mapping of applications (mostly loop algorithms) onto such architectures has been a hindrance in their mass acceptance. In this paper we present (a) a highly parameterizable, tightly coupled, and reconfigurable parallel processor architecture together with the corresponding power breakdown and reconfiguration time analysis of a case study application, (b) a retargetable methodology for mapping of loop algorithms, (c) a co-design framework for modeling, simulation, and programming of such architectures, and (d) loosely coupled communication with host processor.  相似文献   

6.
Dataflow specifications are suitable to describe both signal processing applications and the relative specialized hardware architectures, fostering the hardware–software development gap closure. They can be exploited for the development of automatic tools aimed at the integration of multiple applications on the same coarse-grained computational substrate. In this paper, the multi-dataflow composer (MDC) tool, a novel automatic platform builder exploiting dataflow specifications for the creation of run-time reconfigurable multi-application systems, is presented and evaluated. In order to prove the effectiveness of the adopted approach, a coprocessor for still image and video processing acceleration has been assembled and implemented on both FPGA and 90 nm ASIC technology. 60 % of savings for both area occupancy and power consumption can be achieved with the MDC generated coprocessor compared to an equivalent non-reconfigurable design, without performance losses. Thanks to the generality of high-level dataflow specification approach, this tool can be successfully applied in different application domains.  相似文献   

7.
Cellular computing architectures represent an important class of computation that are characterized by simple processing elements, local interconnect and massive parallelism. These architectures are a good match for many image and video processing applications and can be substantially accelerated with Reconfigurable Computers. We present a flexible software/hardware framework for design, implementation and automatic synthesis of cellular image processing algorithms. The system provides an extremely flexible set of parallel, pipelined and time-multiplexed components which can be tailored through reconfigurable hardware for particular applications. The most novel aspects of our framework include a highly pipelined architecture for multi-scale cellular image processing as well as support for several different pattern recognition applications. In this paper, we will describe the system in detail and present our performance assessments. The system achieved speed-up of at least 100× for computationally expensive sub-problems and 10× for end-to-end applications compared to software implementations.  相似文献   

8.
Highly regular multi-processor architectures are suitable for inherently highly parallelizable applications such as most of the image processing domain. Systems embedded in a single programmable chip platform (SoPC) allow hardware designers to tailor every aspect of the architecture in order to match the specific application needs. These platforms are now large enough to embed an increasing number of cores, allowing implementation of a multi-processor architecture with an embedded communication network. In this paper we present the parallelization and the embedding of a real time image stabilization algorithm on a SoPC platform. Our overall hardware implementation method is based upon meeting algorithm processing power requirements and communication needs with refinement of a generic parallel architecture model. Actual implementation is done by the choice and parameterization of readily available reconfigurable hardware modules and customizable commercially available IPs (Intellectual Property). We present both software and hardware implementation with performance results on a Xilinx SoPC target.  相似文献   

9.
The increased transistor count resulting from ever-decreasing feature sizes has enabled the design of architectures containing many small but efficient processing units (cores). At the same time, many new applications have evolved with varying performance requirements. The fixed architecture of multiCore platforms often fails to accommodate the inherent diverse requirements of different applications. We present a dynamically reconfigurable multiCore architecture that detects program phase change at runtime and adapts to the changing program behavior by reconfiguring itself. We introduce simple but efficient performance counters to monitor vital parameters of reconfigurable architectures. We also present static, dynamic and adaptive reconfiguration techniques for reconfiguring the architecture. Our evaluation of the proposed reconfigurable architecture using an adaptive reconfiguration technique shows an improvement of up to 23% for multi-threaded applications and up to 27% for multiprogrammed workloads over that on statically chosen architectures, and up to 41% over the baseline SMP configuration.  相似文献   

10.
Coarse-grained architectures (CGRAs) can be tailored and optimized for different application domains. The vast design space of coarse-grained reconfigurable architectures complicates the design of optimized processors. The goal is to design a domain-specific processor that provides just enough-flexibility for that domain while minimizing the energy consumption for a given level of performance. However, a flexible architecture template and a retargetable simulator and compiler enable systematic architecture exploration that can lead to more efficient domain-specific architecture design. This article presents such an environment and an architecture exploration for a novel CGRA template.  相似文献   

11.
Traditionally, mechanically steered dishes or analog phased array beamforming systems have been used for radio frequency receivers, where strong directivity and high performance were much more important than low-cost requirements. Real-time controlled digital phased array beamforming could not be realized due to the high computational requirements and the implementation costs. Today, digital hardware has become powerful enough to perform the massive number of operations required for real-time digital beamforming. With the continuously decreasing price per transistor, high performance signal processing has become available by using multi-processor architectures. More and more applications are using beamforming to improve the spatial utilization of communication channels, resulting in many dedicated digital architectures for specific applications. By using a reconfigurable architecture, a single hardware platform can be used for different applications with different processing needs.In this article, we show how a reconfigurable multi-processor system-on-chip based architecture can be used for phased array processing, including an advanced tracking mechanism to continuously receive signals with a mobile satellite receiver. An adaptive beamformer for DVB-S satellite reception is presented that uses an Extended Constant Modulus Algorithm to track satellites. The receiver consists of 8 antennas and is mapped on three reconfigurable Montium TP processors. With a scenario based on a phased array antenna mounted on the roof of a car, we show that the adaptive steering algorithm is robust in dynamic scenarios and correctly demodulates the received signal.  相似文献   

12.
New reconfigurable computing architectures are introduced to overcome some of the limitations of conventional microprocessors and fine-grained reconfigurable devices (e.g., FPGAs). One of the new promising architectures are Configurable System-on-Chip (CSoC) solutions. They were designed to offer high computational performance for real-time signal processing and for a wide range of applications exhibiting high degrees of parallelism. The programming of such systems is an inherently challenging problem due to the lack of an programming model. This paper describes a novel heterogeneous system architecture for signal processing and data streaming applications. It offers high computational performance and a high degree of flexibility and adaptability by employing a micro Task Controller (mTC) unit in conjunction with programmable and configurable hardware. The hierarchically organized architecture provides a programming model, allows an efficient mapping of applications and is shown to be easy scalable to future VLSI technologies. Several mappings of commonly used digital signal processing algorithms for future telecommunication and multimedia systems and implementation results are given for a standard-cell ASIC design realization in 0.18 micron 6-layer UMC CMOS technology.  相似文献   

13.
Many Applications perceive visual information through networks of embedded sensors. Intensive image processing computations have to be performed in order to process the perceived information. Such computations usually demand hardware implementations in order to exhibit real time performance. Furthermore, many of such applications are hard to be characterized a priori, since they take different paths according to events happening in the scene at runtime. Hence, reconfigurable hardware devices are the only viable platform for implementing such applications, providing both real time performance and dynamic adaptability for the system.In this paper, we present a collaborative and dynamically adaptive object tracking system that has been built in our lab. We exploit reconfigurable hardware devices embedded in a number of networked cameras in order to achieve our goal. We justify the need for dynamic adaptation of the system through scenarios and applications. Experimental results on a set of scenes advocate the fact that our system works effectively for different scenario of events through reconfiguration. Comparing results with non-adaptive implementations verify the fact that our approach improves system's robustness to scene variations and outperforms the traditional implementations.  相似文献   

14.
Highly regular many-core architectures tend to be more and more popular as they are suitable for inherently highly parallelizable applications such as most of the image and video processing domain. In this article, we present a novel architecture for many-core microprocessor ASIC dedicated to embedded video and image processing applications. We propose a flexible many-core approach with two architectures one implemented in CMOS 65 nm technology containing 16 open-source tiles and the other implemented in CMOS FD-SOI 28 nm technology containing 64 open-source tiles. Each tile of these architectures can choose its communication links depending on the most relevant overall parallelism scheme for a targeted application. Both chips are fully functional in simulation. The layouts are presented with frequency, area and power consumption results. Various case studies are presented to illustrate the proposed flexible many-core architectures and enable to focus on architecture exploration, instantiated scheme of parallelization and timing performance.  相似文献   

15.
In this paper we propose a framework for modeling and automated generation of heterogeneous SoC architectures with emphasis on reconfigurable component integration and optimized communication media. In order to facilitate rapid development of SoC architectures, communication-centric platforms for data intensive applications, high level modeling of reconfigurable components for quick simulation and a tool for generation of complete SoC architectures is presented. Four different communication-centric platforms based on traditional bus, crossbar, hierarchical bus and novel hybrid communication media are proposed. These communication-centric platforms are proposed to cater for the different communication requirement of future SoC architectures. Multi-Standard telecommunication application is chosen as our target application domain and a case study of WiMAX is used as a real world example to demonstrate the effectiveness of our approach. A system consisting of an ARM processor, reconfigurable FFT and reconfigurable Viterbi decoder is considered with the option of system scalability for future upgrades. Behavior of system with different communication platforms is analyzed for its throughput and power characteristics with different reconfigurable scenarios to show the effectiveness of our approach.  相似文献   

16.
This paper is concerned with the analytical modeling of computer architectures to aid in the design of high-level language-directed computer architectures. High-level language-directed computers are computers that execute programs in a high-level language directly. The design procedure of these computers are at best described as being ad hoc. In order to systematize the design procedure, we introduce analytical models of computers that predict the performance of parallel computations on concurrent computers. We model computers as queueing networks and parallel computations as precedence graphs. The models that we propose are simple and lead to computationally efficient procedures of predicting the performance of parallel computations on concurrent computers. We demonstrate the use of these models in the design of high-level language-directed computer architectures.  相似文献   

17.
Dynamically reconfigurable architectures or systems are able to reconfigure their function and/or structure to suit the changing needs of a computation during run time. The increasing flexibility of modern dynamically reconfigurable systems improves their adaptability to computational needs but also makes fast reconfiguration difficult because of the large amount of reconfiguration information which has to be transferred. However, even when a computation uses this flexibility it will not use it all the time. Therefore, we propose to make the potential for reconfiguration itself reconfigurable. Such architectures are called hyperreconfigurable. Different models of hyperreconfigurable architectures are proposed in this paper. We also study a fundamental problem that emerges on such architectures, namely, to determine for a given computation when and how the potential for reconfiguration should be changed during run time so that the reconfiguration overhead is minimal. It is shown that the general problem is NP-hard but fast polynomial time algorithms are given to solve this problem for special types of hyperreconfigurable architectures. We define two example hyperreconfigurable architectures and illustrate the introduced concepts for corresponding application problems.  相似文献   

18.
Wearable computers are embedded into the mobile environment of their users. A design challenge for wearable systems is to combine the high performance required for tasks such as video decoding with the low energy consumption required to maximise battery runtimes and the flexibility demanded by the dynamics of the environment and the applications. In this paper, we demonstrate that reconfigurable hardware technology is able to answer this challenge. We present the concept and the prototype implementation of an autonomous wearable unit with reconfigurable modules (WURM). We discuss experiments that show the uses of reconfigurable hardware in WURM: ASICs-on-demand and adaptive interfaces. Finally, we present an experiment with an operating system layer for WURM.  相似文献   

19.
循环流水技术运用于粗粒度可重构体系结构可带来显著性能提升.循环控制、流水线同步和存储器有效利用是其中的关键问题.文中介绍了在粗粒度可重构体系结构LEAP上循环自主流水化的硬件实现.该方法基于支持循环迭代自动调度的控制部件、数据驱动ALU和可配置静态交换路由.利用动态调度循环中操作的优势,LEAP可发掘更高的程序并行度;分布式存储访问和高效数据重用则提高了带宽利用率.实验结果表明,相对于通用处理器,LEAP有13.08~535.65倍的性能提升.  相似文献   

20.
In this paper, we propose a concept for multi-level reconfigurable architectures with more than two levels of reconfiguration, and study these architectures theoretically and experimentally. The proposed architectures are extensions of 2-level reconfigurable architectures where the reconfiguration operations on the lowest level correspond to the reconfiguration operations of standard 1-level reconfigurable architectures, and the reconfigurable units are simple switches. It is shown that finding an optimal number of reconfiguration levels and a corresponding reconfiguration scheme that minimizes the number of reconfiguration bits for a given algorithm can be done in polynomial time. But finding the optimal number of reconfiguration levels is NP-hard for heterogeneous multi-level architectures, where the number of reconfiguration levels varies for the different reconfigurable units. Experimental results for different test applications show that 3–4 reconfiguration levels are optimal with respect to the number of reconfiguration bits needed. The number of reconfiguration bits is reduced by 35–86% compared to 1-level reconfiguration and by 8–34% compared to 2-level reconfiguration. The heterogeneous architecture reduces the number of necessary reconfiguration bits by additional 1–5% and also needs less SRAM cells.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号