首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Partial reconfiguration (PR) of FPGAs can be used to dynamically extend and adapt the functionality of computing systems by swapping in and out HW tasks. To coordinate the on-demand task execution, we propose and implement a Run-Time System Manager (RTSM) for scheduling software (SW) tasks on available processor(s) and hardware (HW) tasks on any number of reconfigurable regions (RRs) of a partially reconfigurable FPGA. Fed with the initial partitioning of the application into tasks, the corresponding task graph, and the available task mappings, the RTSM controls system operation considering the status of each task and region (e.g. busy, idle, scheduled for reconfiguration/execution, etc). Our RTSM supports task reuse and configuration prefetching to minimize reconfigurations, task movement among regions to efficiently manage the FPGA area, and region reservation for future reconfiguration and execution. We validate the correctness and portability of our RTSM executing an image processing application on two Xilinx-based platforms: ZedBoard and XUPV5. We also perform a more extensive evaluation of its features using a simulation framework, and find that – despite the technology limitations – our approach can give promising results in terms of scheduling quality. Since our RTSM supports also the scheduling of parallel SW tasks, we use it to manage the execution of the entire parallel Edge Detection application on a desktop; we compare the application execution time with that using the OpenMP framework and find that with our RTSM execution is 2.4 times faster than the unoptimized OpenMP version. When processor affinity optimization is enabled for OpenMP, our RTMS and the OpenMP are on par, indicating that the scheduling efficiency of our RTSM is competitive to this state-of-the-art scheduler, while supporting in addition the management of HW tasks.  相似文献   

2.
In this paper, we propose a configuration-aware data-partitioning approach for reconfigurable computing. We show how the reconfiguration overhead impacts the data-partitioning process. Moreover, we explore the system-level power-performance tradeoffs available when implementing streaming embedded applications on fine-grained reconfigurable architectures. For a certain group of streaming applications, we show that an efficient hardware/software partitioning algorithm is required when targeting low power. However, if the application objective is performance, then we propose the use of dynamically reconfigurable architectures. We propose a design methodology that adapts the architecture and algorithms to the application requirements. The methodology has been proven to work on a real research platform based on Xilinx devices. Finally, we have applied our methodology and algorithms to the case study of image sharpening, which is required nowadays in digital cameras and mobile phones.  相似文献   

3.
The need for low-power embedded systems has become very significant within the microelectronics scenario in the most recent years. A power-driven methodology is mandatory during embedded systems design to meet system-level requirements while fulfilling time-to-market. The aim of this paper is to introduce accurate and efficient power metrics included in a hardware/software (HW/SW) codesign environment to guide the system-level partitioning. Power evaluation metrics have been defined to widely explore the architectural design space at high abstraction level. This is one of the first approaches that considers globally HW and SW contributions to power in a system-level design flow for control dominated embedded systems  相似文献   

4.
In this paper, we present an efficient HW/SW codesign architecture for H.263 video encoder and its FPGA implementation. Each module of the encoder is investigated to find which approach between HW and SW is better to achieve real-time processing speed as well as flexibility. The hardware portions include the Discrete Cosine Transform (DCT), inverse DCT (IDCT), quantization (Q) and inverse quantization (IQ). Remaining parts were realized in software executed by the NIOS II softcore processor. This paper also introduces efficient design methods for HW and SW modules. In hardware, an efficient architecture for the 2-D DCT/IDCT is suggested to reduce the chip size. A NIOS II Custom instruction logic is used to implement Q/IQ. Software optimization technique is also explored by using the fast block-matching algorithm for motion estimation (ME). The whole design is described in VHDL language, verified in simulations and implemented in Stratix II EP2S60 FPGA. Finally, the encoder has been tested on the Altera NIOS II development board and can work up to 120 MHz. Implementation results show that when HW/SW codesign is used, a 15.8-16.5 times improvement in coding speed is obtained compared to the software based solution.  相似文献   

5.
This article presents a systematic approach to hardware/software codesign targeting data-intensive applications. It focuses on the application processes that can be represented in directed acrylic graphs (DAGs) and use a synchronous dataflow (SDF) model, the popular form of dataflow employed in DSP systems when running the process. The codesign system is based on the ultrasonic reconfigurable platform, a system designed jointly at Imperial College and the SONY Broadcast Laboratory. This system is modeled as a loosely coupled structure consisting of a single instruction processor and multiple reconfigurable hardware elements. The paper also introduces and demonstrates a task-based hardware/software codesign environment specialized for real-time video applications. Both the automated partitioning and scheduling environment and the task manager program help to provide a fast robust for supporting demanding applications in the codesign system.  相似文献   

6.
In this paper, we analyze the main issues in context scheduling for multicontext reconfigurable architectures from a formal point of view. We first provide an intuitive approach. which is later supported by a detailed analysis of the mathematical relations that express the reconfiguration process. This enables us to deduce a methodology for the minimization of context loading overhead, which considers the tradeoff between achievable system performance and algorithm efficiency. In this respect, the optimality necessary conditions are established in order to contrive an optimal search. However, as this approach is very time consuming we propose some heuristic techniques that reduce the algorithm complexity and accomplish very good results in relatively short execution time. This work has been developed as a part of an automated design environment for reconfigurable systems. A set of experiments has been developed so as to validate the theoretical results  相似文献   

7.
The Rapid Prototyping of Application-Specific Signal Processors (RASSP) [1–3] program of the US Department of Defense (ARPA and Tri-Services) targets a 4X improvement in the design, prototyping, manufacturing, and support processes (relative to current practice). Based on a current practice study (1993) [4], the prototyping time from system requirements definition to production and deployment, of multiboard signal processors, is between 37 and 73 months. Out of this time, 25–49 months is devoted to detailed hardware/software (HW/SW) design and integration (with 10–24 months devoted to the latter task of integration). With the utilization of a promising top-down hardware-less codesign methodology based on VHDL models of HW/SW components at multiple abstractions, reduction in design time has been shown especially in the area of hardware/software integration [5]. The authors describe a top-down design approach in VHDL starting with the capture of system requirements in an executable form and through successive stages of design refinement, ending with a detailed hardware design. This hardware/software codesign process is based on the RASSP program design methodology called virtual prototyping, wherein VHDL models are used throughout the design process to capture the necessary information to describe the design as it develops through successive refinement and review. Examples are presented to illustrate the information captured at each stage in the process. Links between stages are described to clarify the flow of information from requirements to hardware.  相似文献   

8.
The complexity of hardware/software (HW/SW) interfacing and the lack of portability across different platforms, restrain the widespread use of reconfigurable accelerators and limit the designer productivity. Furthermore, communication between SW and HW parts of codesigned applications are typically exposed to SW programmers and HW designers. In this work, we introduce a virtualization layer that allows reconfigurable application-specific coprocessors to access the user-space virtual memory and share the memory address space with user applications. The layer, consisting of an operating system (OS) extension and a HW component, shifts the burden of moving data between processor and coprocessor from the programmer to the OS, lowers the complexity of interfacing, and hides physical details of the system. Not only does the virtualization layer enhance programming abstraction and portability, but it also performs runtime optimizations: by predicting future memory accesses and speculatively prefetching data, the virtualization layer improves the coprocessor execution-applications achieve better performance without any user intervention. We use two different reconfigurable system-on-chip (SoC) running Linux and codesigned applications to prove the viability of our concept. The applications run faster than their SW versions, and the overhead due to the virtualisation is limited. Dynamic prefetching in the virtualisation layer further reduces the abstraction overhead.  相似文献   

9.
Partial dynamic reconfiguration, often called run-time reconfiguration (RTR), is a key feature in modern reconfigurable platforms. In this paper, we present parallelism granularity selection (PARLGRAN), an application mapping approach that maximizes performance of application task chains on architectures with such capability. PARLGRAN essentially selects a suitable granularity of data-parallelism for individual data parallel tasks while considering key issues such as significant reconfiguration overhead and placement constraints. It integrates granularity selection very effectively in a joint scheduling and placement formulation, necessary due to constraints imposed by partial RTR. As a key step to validating PARLGRAN, we additionally present an exact strategy (integer linear programming formulation). We demonstrate that PARLGRAN generates high-quality schedules with: (1) a set of small test cases where we compare our results with the exact strategy; (2) a very large set of synthetic experiments with over a thousand data-points where we compare it with a simpler strategy that tries to statically maximize data-parallelism, i.e., only considers resource availability; and (3) a detailed application case study of JPEG encoding. The application case-study confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism. Last, but very important, we demonstrate that our approach is well-suited for true on-demand computing with detailed execution time estimates on a typical embedded processor. Heuristic execution time is comparable to task execution time, i.e., it is feasible to integrate PARLGRAN in a run-time scheduler for dynamically reconfigurable architectures.  相似文献   

10.
When performing hardware/software co-design for embedded systems, the problem of which functions of the system should be implemented in hardware (HW) or in software (SW) emerges. This problem is known as HW/SW partitioning. Over the last 10 years, a significant research effort has been carried out in this area. In this paper, we present two new approaches to solve the HW/SW partitioning problem by using verification techniques based on satisfiability modulo theories (SMT). We compare the results using the traditional technique of integer linear programming, specifically binary integer programming and a modern method of optimization by genetic algorithm. The experimental results show that SMT-based verification techniques can be effective in particular cases to solve the HW/SW partition problem optimally using a state-of-the-art model checker based on SMT solvers, when compared against traditional techniques.  相似文献   

11.
Hardware/software (HW/SW) partitioning and scheduling are the crucial steps during HW/SW co-design. It has been shown that they are classical combinatorial optimization problems. Due to the possible sequential or concurrent execution of the tasks, HW/SW partitioning and scheduling has become more difficult to solve optimally. In this paper more efficient heuristic algorithms are proposed for the HW/SW partitioning and scheduling. The proposed algorithm partitions a task graph by iteratively moving the task with highest benefit-to-area ratio in higher priority. The benefit-to-area ratio is updated in each iteration step to cater for the task concurrence. The proposed algorithm for task scheduling executes the task lying in hardware-only critical path in higher priority to enhance the task forecast. A large body of experimental results conclusively shows that the proposed heuristic algorithm for partitioning is superior to the latest efficient combinatorial algorithm (Tabu search) cited in this paper. Moreover, the Tabu search for partitioning has been further improved by utilizing the proposed heuristic solution as its initial solution. In addition, the proposed scheduling algorithm obtains the improvements over the most widely used approaches by up to 10% without large increase in running time. This work was presented in part at 2006 IEEE International Conference on Field Programmable Technology (ICFPT).  相似文献   

12.
Dynamically reconfigurable architectures are emerging as a viable design alternative to implement a wide range of computationally intensive applications. At the same time, an urgent necessity has arisen for support tool development to automate the design process and achieve optimal exploitation of the architectural features of the system. Task scheduling and context (configuration) management become very critical issues in achieving the high performance that digital signal processing (DSP) and multimedia applications demand. This article proposes a strategy to automate the design process which considers all possible optimizations that can be carried out at compilation time, regarding context and data transfers. This strategy is general in nature and could be applied to different reconfigurable systems. We also discuss the key aspects of the scheduling problem in a reconfigurable architecture such as MorphoSys. In particular, we focus on a task scheduling methodology for DSP and multimedia applications, as well as the context management and scheduling optimizations  相似文献   

13.
This paper proposes the Tissue methodology as a novel methodology for analysis, design and synthesis of networked embedded systems and subsequent development of distributed architectural frameworks. The proposed method aims at reducing the development time through the use of reconfigurable HW/SW components and the application of automatic code generation techniques. We devise the usefulness of the proposed methodology in the context of mobile ad-hoc networks (MANET) which exploit Software Radio (SR) technology for reconfigurability issues. Drawbacks of current design and simulation tools and advantages coming from the application of the TM are discussed in the paper.  相似文献   

14.
软/硬件协同设计方法学研究的现状与分析   总被引:3,自引:0,他引:3  
通过简述集成电路工业的发展现状 ,引出软 /硬件协同设计方法学研究的重要性 ,并阐释了软 /硬件协同设计的主要概念。然后 ,着重介绍了目前各种有关国际成果和主流方向 ,并结合对相关领域问题的分析 ,深入分析了这些研究导向的共性与不足。最后以此为基础 ,试提出了“全定制”软 /硬件协同设计方法学所应遵循的一种建设性研究思路  相似文献   

15.
软硬件协同验证是系统芯片设计的重要组成部分。针对基于32 Bit CPU核的某控制系统芯片的具体要求,提出了一种系统芯片软硬件协同验证策略,构建了一个软硬件协同验证环境。该环境利用处理器内核模型支持内核指令集的特性运行功能测试程序,实现SoC软硬件的同步调试,并能够快速定位软硬件的仿真错误点,有效提高了仿真效率。该SoC软硬件协同验证环境完成了设计目的,并对其他系统芯片设计具有一定的参考价值。  相似文献   

16.
This paper addresses the initial acquisition for the UMTS-FDD W-CDMA standard. It presents an innovative cell searcher design optimized for acquisition speed and low power consumption [1]. The proposed architecture relies on a memory-based digital matched filter with permuted processing order. We can reconfigure the same filtering hardware to process the three steps of the UMTS-FDD initial cell search, due to a thorough HW/SW codesign, in particular the implementation of an innovative algorithm for the second step of the initial acquisition. The searcher can also perform other functions such as initial delay profiling, neighboring cell search and idle-mode timing alignment, which are typically carried out, at least partially, by the Rake receiver. These additional capabilities allow for further system-level power savings because they avoid activating the Rake and reduce the RF front-end working time to a bare minimum.  相似文献   

17.
Design automation for embedded systems comprising both hardware and software components demands for code generators integrated into electronic CAD systems. These code generators provide the necessary link between software synthesis tools in HW/SW codesign systems and embedded processors. General-purpose compilers for standard processors are often insufficient, because they do not provide flexibility with respect to different target processors and also suffer from inferior code quality. While recent research on code generation for embedded processors has primarily focussed on code quality issues, in this contribution we emphasize the importance of retargetability, and we describe an approach to achieve retargetability. We propose usage of uniform, external target processor models in code generation, which describe embedded processors by means of RT-level netlists. Such structural models incorporate more hardware details than purely behavioral models, thereby permitting a close link to hardware design tools and fast adaptation to different target processors. The MSSQ compiler, which is part of the MIMOLA hardware design system, operates on structural models. We describe input formats, central data structures, and code generation techniques in MSSQ. The compiler has been successfully retargeted to a number of real-life processors, which proves feasibility of our approach with respect to retargetability. We discuss capabilities and limitations of MSSQ, and identify possible areas of improvement.  相似文献   

18.
In multiprocessor system-on-chip, tasks and communications should be scheduled carefully since their execution order affects the performance of the entire system. When we implement an MPSoC according to the scheduling result, we may find that the scheduling result is not correct or timing constraints are not met unless it takes into account the delays of MPSoC architecture. The unexpected scheduling results are mainly caused from inaccurate communication delays and or runtime scheduler’s overhead. Due to the big complexity of scheduling problem, most previous work neglects the inter-processor communication, or just assumes a fixed delay proportional to the communication volume, without taking into consideration subtle effects like the communication congestion and synchronization delay, which may change dynamically throughout tasks execution. In this paper, we propose an accurate scheduling model of hardware/software communication architecture to improve timing accuracy by taking into account the effects of dynamic software synchronization and detailed hardware resource constraints such as communication congestion and buffer sharing. We also propose a method for runtime scheduler implementation and consider its performance overhead in scheduling. In particular, we introduce efficient hardware and software scheduler architectures. Furthermore, we address the issue of centralized implementation versus distributed implementation of the schedulers. We investigate the pros and cons of the two different scheduler implementations. Through experiments with significant demonstration examples, we show the effectiveness of the proposed approach.  相似文献   

19.
This paper addresses the initial acquisition for the UMTS-FDD W-CDMA standard. It presents an innovative cell searcher design optimized for acquisition speed and low power consumption [1]. The proposed architecture relies on a memory-based digital matched filter with permuted processing order. We can reconfigure the same filtering hardware to process the three steps of the UMTS-FDD initial cell search, due to a thorough HW/SW codesign, in particular the implementation of an innovative algorithm for the second step of the initial acquisition. The searcher can also perform other functions such as initial delay profiling, neighboring cell search and idle-mode timing alignment, which are typically carried out, at least partially, by the Rake receiver. These additional capabilities allow for further system-level power savings because they avoid activating the Rake and reduce the RF front-end working time to a bare minimum.  相似文献   

20.
In this paper, we propose techniques for fast cycle-approximate multi-processor SoC simulation with timed transaction level models and OS models. Cycle-approximate simulation with an abstract model is widely used for fast validation of a multi-processor SoC in early design stages. However, the performance gain of abstract-level simulation is limited by the overhead of synchronizing multiple concurrent processor/module simulators, which is inevitable in timed simulation. To reduce the synchronization overhead, we adopt the synchronization time-point prediction method, which consists of two phases: static code analysis and dynamic scheduling of synchronizations. In the static analysis phase before simulation, it estimates minimum execution time from every point in the code to the nearest synchronization point. Then, during simulation, it pessimistically predicts the synchronization time-points based on the estimates. The proposed approach targets fast cycle-approximate simulation of a system with delay annotated SW code and transaction level models of HW with dynamic behavior. We present, in this paper, techniques to analyze such abstract models of SW and HW and schedule minimal number synchronizations during cycle-approximate simulation of the models. Experiments show that the approach achieves orders of magnitude higher performance in cycle-approximate multi-processor SoC simulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号