首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
Due to the increasing demands on efficiency, performance and flexibility reconfigurable computational architectures are very promising candidates in embedded systems design. Recently coarse-grained reconfigurable array architectures (CGRAs), such as the ADRES CGRA and its corresponding DRESC compiler are gaining more popularity due to several technological breakthroughs in this area. We investigate the mapping of two image processing algorithms, Wavelet encoding and decoding, and TIFF compression on this novel type of array architectures in a systematic way. The results of our experiments show that CGRAs based on ADRES and its DRESC compiler technology deliver improved performance levels for these two benchmark applications when compared to results obtained on a state-of-the-art commercial DSP platform, the c64x DSP from Texas Instruments. ADRES/DRESC can beat its performance by at least 50% in cycle count and the power consumption even drops to 10% of the published numbers of the c64x DSP.  相似文献   

2.
Coarse-grained reconfigurable arrays (CGRAs) have shown potential for application in embedded systems in recent years. Numerous reconfigurable processing elements (PEs) in CGRAs provide flexibility while maintaining high performance by exploring different levels of parallelism. However, a difference remains between the CGRA and the application-specific integrated circuit (ASIC). Some application domains, such as software-defined radios (SDRs), require flexibility with performance demand increases. More effective CGRA architectures are expected to be developed. Customisation of a CGRA according to its application can improve performance and efficiency. This study proposes an application-specific CGRA architecture template composed of generic PEs (GPEs) and special PEs (SPEs). The hardware of the SPE can be customised to accelerate specific computational patterns. An automatic design methodology that includes pattern identification and application-specific function unit generation is also presented. A mapping algorithm based on ant colony optimisation is provided. Experimental results on the SDR target domain show that compared with other ordinary and application-specific reconfigurable architectures, the CGRA generated by the proposed method performs more efficiently for given applications.  相似文献   

3.
Recently coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their efficiency and flexibility. While many CGRAs have demonstrated impressive performance improvements, the effectiveness of CGRA platforms ultimately hinges on the compiler. Existing CGRA compilers do not model the details of the CGRA, and thus they are i) unable to map applications, even though a mapping exists, and ii) using too many processing elements (PEs) to map an application. In this paper, we model several CGRA details, e.g., irregular CGRA topologies, shared resources and routing PEs in our compiler and develop a graph drawing based approach, split-push kernel mapping (SPKM), for mapping applications onto CGRAs. On randomly generated graphs our technique can map on average 4.5times more applications than the previous approach, while generating mappings which have better qualities in terms of utilized CGRA resources. Utilizing fewer resources is directly translated into increased opportunities for novel power and performance optimization techniques. Our technique shows less power consumption in 71 cases and shorter execution cycles in 66 cases out of 100 synthetic applications, with minimum mapping time overhead. We observe similar results on a suite of benchmarks collected from Livermore loops, Mediabench, Multimedia, Wavelet and DSPStone benchmarks. SPKM is not a customized algorithm only for a specific CGRA template, and it is demonstrated by exploring various PE interconnection topologies and shared resource configurations with SPKM.  相似文献   

4.
Design Space Exploration (DSE) with multi-parametric objective in High Level Synthesis (HLS) involves assessing the various design points in the architecture design space to find the optimum solution for the design according to the system requirements specified. Due to the time to market pressure, the cost of solving the problem of architecture selection by exhaustive analysis is strictly forbidden. The tradeoffs linked to the selection of the appropriate design point during architecture evaluation needs careful assessment for efficient design space exploration. Further DSE requires satisfying multiple conflicting multi objective conditions such as increase in accuracy of evaluation during DSE with simultaneous speedup in the exploration process. This paper presents a novel hybrid design space exploration approach which is a combination of the Priority Factor (PF) method and Fuzzy search technique that is rapid and accurate in architecture evaluation and selection. The proposed approach for DSE when applied on a number of benchmarks yielded superior results compared to the current existing DSE approach for architecture selection. The comparison results of the proposed hybrid approach with the current existing approach for different benchmarks are shown and the speedups obtained are also presented.  相似文献   

5.
We evaluate the validity of the fundamental assumption behind application-specific programmable processors: that applications differ from each other in key parameters which are exploitable, such as the available instruction-level parallelism (ILP), demand on various hardware resources, and the desired mix of function units. Following the tradition of the CAD community, we develop an accurate chip area estimate and a set of aggressive hardware optimization algorithms. We follow the tradition of the architecture community by using comprehensive real-life benchmarks and production quality tools. This combination enables us to build a unique framework for system-level synthesis and to gain valuable insights about design and use of application-specific programmable processors for modern applications. We explore the application-specific programmable processor (ASSP) design space to understand the relationship between performance and area. The architecture model we used is the Hewlett Packard PA-RISC with single level caches. The system, including all memory and bus latencies, is simulated and no other specialized ALU or memory structures are being used. The experimental results reveal a number of important characteristics of the ASSP design space. For example, we found that in most cases a single programmable architecture performs similarly to a set of architectures that are tuned to individual application. A notable exception is highly cost sensitive designs, which we observe need a small number of specialized architectures that require smaller areas. Also, it is clear that there is enough parallelism in the typical media and communication applications to justify use of high number of function units. We found that the framework introduced in this paper can be very valuable in making early design decisions such as area and architectural configuration tradeoff, cache and issue width tradeoff under area constraint, and the number of branch units and issue width  相似文献   

6.
7.
This paper presents design, development and evaluation of an eXtra-large Scale, Homogeneous and a Heterogeneous Accelerator-Rich Platform (HARP2) for massively parallel signal processing algorithms. HARP is an integrated platform of multiple Coarse-Grained Reconfigurable Arrays (CGRAs) over a Network-on-Chip (NoC) where each CGRA is scaled and tailored for a specific application. The architecture of the NoC consists of nine nodes in a topology of 3-rows × 3-columns and acts as backbone of communication between different CGRAs. In this experimental work, the HARP template is used to instantiate a homogeneous (HARP-hom) and a heterogeneous (HARP-het) platform. The HARP-het is generated for a proof-of-concept test to verify the design and functionality of HARP. It also provides insight to many features of the design and evaluation in terms of different performance metrics. The other version (HARP-hom) is instantiated for a relatively realistic design problem, i.e., satisfying the execution-time constraints imposed on Fast Fourier Transform processing in IEEE-802.11n demodulators. Both of the versions of HARP are treated for comparative analysis using different performance metrics against some of the existing state-of-the-art platforms. The HARP versions are designed to illustrate large-scale homogeneous/heterogeneous multicore architectures while presenting the advantages of maximizing the number of reconfigurable processing resources on a single chip.  相似文献   

8.
We present a novel symbolic routability checking approach for three-dimensional interconnect layout. The model considered is a general architecture that can fit into different applications, such as ASIC, multichip modules, field-programmable gate arrays, and reconfigurable computing architectures. The method can incrementally incorporate additional constraints driven by timing, performance, and design. We used the latest satisfiability solver to validate the effectiveness of our approach. The experimental results demonstrate the encouraging performance on difficult routing benchmarks.  相似文献   

9.
Hardware/software (HW/SW) codesign and reconfigurable computing are commonly used methodologies for digital-systems design. However, no previous work has been carried out in order to define a HW/SW codesign methodology with dynamic scheduling for run-time reconfigurable architectures. In addition, all previous approaches to reconfigurable computing multicontext scheduling are based on static-scheduling techniques. In this paper, we present three main contributions: 1) a novel HW/SW codesign methodology with dynamic scheduling for discrete event systems using dynamically reconfigurable architectures; 2) a new dynamic approach to reconfigurable computing multicontext scheduling; and 3) a HW/SW partitioning algorithm for dynamically reconfigurable architectures. We have developed a whole codesign framework, where we have applied our methodology and algorithms to the case study of software acceleration. An exhaustive study has been carried out, and the obtained results demonstrate the benefits of our approach.  相似文献   

10.
This paper introduces a novel approach to rapid Design Space Exploration (DSE) and presents a formalized High Level Synthesis (HLS) design flow with multi-parametric optimization objective using the same design space exploration approach. The proposed approach resolves issues related to DSE such as the precision of evaluation, time exhausted during evaluation and also automation of the exploration process. During DSE a conflicting situation always exists for the designer to concurrently maximize the accuracy of the exploration process and minimize the time spent during DSE analysis. This technique is capable of drastically reducing the number of architectural variants to be analyzed for accurate selection of the optimal design point in a short time. Unlike other proposed DSE approaches, the focus here is on determining the Priority Factor (PF) of the resources for final organization of the design space in increasing or decreasing order without requiring graphs or hierarchical tree arrangements to analyze the candidate variants. The proposed approach is capable of simultaneously optimizing many performance parameters viz. time of execution, power consumption, hardware area and cost. The DSE results for many benchmarks are presented along with a comparison with an existing DSE approach that uses the hierarchal structure method for architecture evaluation. Results indicated significant improvement in speedup compared to the existing approach.  相似文献   

11.
In this paper, we propose a configuration-aware data-partitioning approach for reconfigurable computing. We show how the reconfiguration overhead impacts the data-partitioning process. Moreover, we explore the system-level power-performance tradeoffs available when implementing streaming embedded applications on fine-grained reconfigurable architectures. For a certain group of streaming applications, we show that an efficient hardware/software partitioning algorithm is required when targeting low power. However, if the application objective is performance, then we propose the use of dynamically reconfigurable architectures. We propose a design methodology that adapts the architecture and algorithms to the application requirements. The methodology has been proven to work on a real research platform based on Xilinx devices. Finally, we have applied our methodology and algorithms to the case study of image sharpening, which is required nowadays in digital cameras and mobile phones.  相似文献   

12.
Most of the coarse-grained reconfigurable architectures (CGRAs) are composed of reconfigurable ALU arrays and configuration cache (or context memory) to achieve high performance and flexibility. Specially, configuration cache is the main component in CGRA that provides distinct feature for dynamic reconfiguration in every cycle. However, frequent memory-read operations for dynamic reconfiguration cause much power consumption. Thus, reducing power in configuration cache has become critical for CGRA to be more competitive and reliable for its use in embedded systems. In this paper, we propose dynamically compressible context architecture for power saving in configuration cache. This power-efficient design of context architecture works without degrading the performance and flexibility of CGRA. Experimental results show that the proposed approach saves up to 39.72% power in configuration cache with negligible area overhead (2.16%).   相似文献   

13.
Coarse-grained reconfigurable architectures (CGRAs) require many processing elements (PEs) and a configuration memory unit (configuration cache) for reconfiguration of its PE array. Although this structure is meant for high performance and flexibility, it consumes significant power. Specially, power consumption by configuration cache is explicit overhead compared to other types of intellectual property (IP) cores. Reducing power is very crucial for CGRA to be more competitive and reliable processing core in embedded systems. In this paper, we propose a reusable context pipelining (RCP) architecture to reduce power-overhead caused by reconfiguration. It shows that the power reduction can be achieved by using the characteristics of loop pipelining, which is a multiple instruction stream, multiple data stream (MIMD)-style execution model. RCP efficiently reduces power consumption in configuration cache without performance degradation. Experimental results show that the proposed approach saves much power even with reduced configuration cache size. Power reduction ratio in the configuration cache and the entire architecture are up to 86.33% and 37.19%, respectively, compared to the base architecture.  相似文献   

14.
With advances in process technology, soft errors are becoming an increasingly critical design concern. Owing to their large area, high density, and low operating voltages, caches are worst hit by soft errors. Based on the observation that in multimedia applications, not all data require the same amount of protection from soft errors, we propose a partially protected cache (PPC) architecture, in which there are two caches, one protected and the other unprotected at the same level of memory hierarchy. We demonstrate that as compared to the existing unprotected cache architectures, PPC architectures can provide 47 times reduction in failure rate, at only 1% runtime and 3% power overheads. In addition, the failure rate reduction obtained by PPCs is very sensitive to the PPC cache configuration. Therefore, this observation provides an opportunity for further improvement of the solution by correctly parameterizing the PPC configurations. Consequently, we develop design space exploration (DSE) strategies to discover the best PPC configuration. Our DSE technique can reduce the exploration time by more than six times as compared to an exhaustive approach.   相似文献   

15.
Reconfigurable hardware has become a well-accepted option for implementing digital signal processing (DSP). Traditional devices such as field-programmable gate arrays offer good fine-grain flexibility. More recent coarse-grain reconfigurable architectures are optimized for word-length computations. We have developed a medium-grain reconfigurable architecture that combines the advantages of both approaches. Modules such as multipliers and adders are mapped onto blocks of 4-bit cells. Each cell contains a matrix of lookup tables that either implement mathematics functions or a random-access memory. A hierarchical interconnection network supports data transfer within and between modules. We have created software tools that allow users to map algorithms onto the reconfigurable platform. This paper analyzes the implementation of several common benchmarks, ranging from floating-point arithmetic to a radix-4 fast Fourier transform. The results are compared to contemporary DSP hardware.  相似文献   

16.
Reconfigurable hardware is ideal for use in systems-on-a-chip (SoC), as it provides both hardware-level performance and post-fabrication flexibility. However, any one architecture is rarely equally optimized for all applications. SoCs targeting a specific set of applications can greatly benefit from incorporating customized reconfigurable logic instead of generic field-programmable gate-array (FPGA) logic. Unfortunately, manually designing a domain-specific architecture for every SoC would require significant design time. Instead, this paper discusses our initial efforts towards creating a reconfigurable hardware generator capable of automatically creating flexible, yet domain-specific, designs. Our tests indicate that our generated architectures are more than 5times smaller than equivalent FPGA implementations and nearly as area-efficient as standard cell designs. We also use a novel technique employing synthetic circuit generation to demonstrate the flexibility of our architecture generation techniques.  相似文献   

17.
Dynamically reconfigurable architectures are emerging as a viable design alternative to implement a wide range of computationally intensive applications. At the same time, an urgent necessity has arisen for support tool development to automate the design process and achieve optimal exploitation of the architectural features of the system. Task scheduling and context (configuration) management become very critical issues in achieving the high performance that digital signal processing (DSP) and multimedia applications demand. This article proposes a strategy to automate the design process which considers all possible optimizations that can be carried out at compilation time, regarding context and data transfers. This strategy is general in nature and could be applied to different reconfigurable systems. We also discuss the key aspects of the scheduling problem in a reconfigurable architecture such as MorphoSys. In particular, we focus on a task scheduling methodology for DSP and multimedia applications, as well as the context management and scheduling optimizations  相似文献   

18.
The design space exploration (DSE) problem addressed in this paper is to find out Multi-Processor System-on-Chip architectures for a given multi-task signal processing application aiming to minimize the system cost while satisfying the real-time constraints. It involves the following three sub-problems: selecting processing elements, mapping an application to the processing elements, and determining the communication architecture. The proposed approach consists of two inner design loops: one is a cosynthesis loop that determines the selection of PEs and the mapping of a given application to the PEs, and the other is a communication architecture synthesis loop to find the hierarchical shared bus architecture. We specify an application with a synchronous data flow (SDF) model of computation that has well-matched semantics with the algorithmic function flow in DSP applications. To solve the problem, we need to compare the estimated performance of design points and choose the best ones. The common method of simulation-based performance estimation is too time-consuming to explore the wide design space. Thanks to the analytical properties of the SDF model, the performance estimation can be done without HW/SW cosimulation in both loops. A global feedback from the communication architecture synthesis step to the cosynthesis step forms the proposed DSE framework. We use a real-life application, 4-channel Digital Video Recorder (DVR) that is a multi-task example, as well as randomly generated graphs to show the viability of the proposed approach.  相似文献   

19.
20.
A new reconfigurable architectural template is presented. Such a template is composed of coarse-grained and fine-grained reconfigurable datapath and control to obtain performances at custom designed chip level. To show the adaptability/performance of such architectural template, the architecture has been customized (i.e. datapath and control features of the template have been properly sized) for multimedia application domain. To evaluate complexity and maximum clock frequency of the proposed architecture, it has been synthesized using Synopsys Design Compiler on a standard-cell 0.18 μ m technology. Estimated number of transistors is 335 K, while maximum allowable frequency is 460 MHz. Performances have been evaluated comparing the number of clock cycles and the processing time required to process application domain dominant kernels with commercial devices: we obtained up to 95% reduction with respect to ARM and up to 94% reduction with respect to TMS320C5510 in terms of clock cycles. Salvatore M. Carta (1997 Electronic Eng. Master. 2002 Electronics and Computer Science PhD) joined the Department of Electrical and Electronics Engineering of the University of Cagliari, Italy in 1998 as PhD student. From 2005 he has been assistant professor in Department of Mathematics and Computer Science of the University of Cagliari. His research interests focus mainly on architectures, software and tools for embedded and portable computing, with particular emphasis on: languages, architectures and compilers for reconfigurable and parallel computing; Networks-on-chip; Operating systems for multiprocessor-systems-on-chip; low power real-time scheduling algorithms. Danilo Pani (2002 Electronic Eng. Master, 2006 Electronics and Computer Science PhD) joined the Department of Electrical and Electronics engineering of the University of Cagliari, Italy in 2002 as Electronics and Computer Science PhD student. His primary research interests are in the area of Digital Signal Processing architectures and systems, Biomedical Engineering, Reconfigurable Systems and Cooperative VLSI architectures for distributed computing. Luigi Raffo (1989 Master, 1994 Electronics and Computer Science PhD) joined Department of Electrical and Electronics Engineering of the University of Cagliari, Italy in 1994 as assistant professor. From 1998 he has been professor of Digital System Design, Integrated Systems Architectures and Microelectronics at the same Department. His research activity is mainly in the design of low-power analog and digital architectures/chips. He has been project manager of many local and international projects. He is author of more than 50 international papers in the field.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号