期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Safari Through the MPSoC Run-Time Management Jungle

Vincent Nollet Diederik Verkest Henk Corporaal 《Journal of Signal Processing Systems》2010,60(2):251-268

The multiprocessor SoC (MPSoC) revolution is fueled by the need to execute multiple advanced multimedia applications on a single embedded computing platform. At design-time, the applications that will run in parallel and their respective user requirements are unknown. Hence, a run-time manager (RTM) is needed to match all application needs with the available platform resources and services. Creating such a run-time manager requires two decisions. First, one needs to decide what functionality to implement. Second, one has to decide how to implement this functionality in order to meet boundary conditions like e.g. real-time performance. This paper is the first to detail a generic view on MPSoC run-time management functionality and its design space trade-offs. We substantiate the run-time components and the implementation trade-offs with academic state-of-the-art solutions and a brief overview of some industrial multiprocessor run-time management examples. We show a clear trend towards more hardware acceleration, a limited distribution of management functionality over the platform and increasing support for adaptive multimedia applications. In addition, we briefly detail upcoming run-time management research issues. 相似文献

2.

Hardware/Software Co-Design of Run-Time Schedulers for Real-Time Systems

Vincent John Mooney III Giovanni De Micheli 《Design Automation for Embedded Systems》2000,6(1):89-144

Wepresent the Serra Run-Time Scheduler Synthesis and AnalysisTool which automatically generates a run-time scheduler froma heterogeneous system-level specification in both Verilog HDLand C. Part of the run-time scheduler is implemented in hardware,which allows the scheduler to be predictable in being able tomeet hard real-time constraints, while part is implemented insoftware, thus supporting features typical of software schedulers. Serra's real-time analysis generates a priority assignment forthe software tasks in the mixed hardware-software system. Thetasks in hardware and software have precedence constraints, resourceconstraints, relative timing constraints, and a rate constraint.A heuristic scheduling algorithm assigns the static prioritiessuch that a hard real-time rate constraint can be predictablymet. Serra supports the specification of critical regions insoftware, thus providing the same functionality as semaphores.We describe the task control/data-flow extraction,synthesis of the control portion of the run-time scheduler inhardware, real-time analysis and priority scheduler template.We also show how our approach fits into an overall tool flowand target architecture. Finally, we conclude with a sample applicationof the novel run-time scheduler synthesis and analysis tool toa robotics design example. 相似文献

3.

异构多处理器系统的混合任务调度算法

张俊祥冯金富于心一《电光与控制》2011,18(12):39-43

针对实时异构系统的任务调度问题,提出了一种异构多处理器系统的混合实时任务调度算法.该算法采用带有非周期服务器的EDF( Earliest Deadline First)算法来调度单处理器上的任务集,可充分利用处理器的计算带宽.采用启发式搜索算法来进行任务的分配,以最大剩余计算带宽为搜索指标,可确保各处理器的负载尽量平衡... 相似文献

4.

Reconfigurable topology synthesis for application-specific NoC on partially dynamically reconfigurable systems

《Integration, the VLSI Journal》2019

In this paper, a four-stage method for synthesizing reconfigurable ASNoC topology is proposed for partially dynamically reconfigurable systems, where the topology is reconfigured dynamically at run-time along with the application's execution. Firstly, a simulated annealing based topology-aware integrated optimization framework is proposed to generate the proper schedule and floorplan of task modules. Secondly, based on the schedule and floorplan of task modules, an Integer Linear Programming (ILP)-based method and a heuristic method, are proposed to partition the communication requirements of the application into

T

time intervals. Thirdly, we explore the proper positions of switches in the floorplan for global communications. Finally, considering the reconfiguration costs between adjacent time intervals, the routing path allocation problem is solved for time intervals in an iterative procedure to generate fine-grained dynamically reconfigurable ASNoC topologies. Experimental results show that, compared to the random partition of communication requirements, the proposed heuristic method and ILP-based method can achieve 5.4% and 10.0% power consumption improvement, respectively. And, the reconfigurable ASNoC can achieve 31.6% power consumption improvement when compared with static ASNoC. 相似文献

5.

A design flow for speeding-up dsp applications in heterogeneous reconfigurable systems

Michalis D. Galanis Athanassios Milidonis Athanassios P. Kakarountas Costas E. Goutis 《Microelectronics Journal》2006,37(6):554-564

In this paper, we propose a method for speeding-up Digital Signal Processing applications by partitioning them between the reconfigurable hardware blocks of different granularity and mapping critical parts of applications on coarse-grain reconfigurable hardware. The reconfigurable hardware blocks are embedded in a heterogeneous reconfigurable system architecture. The fine-grain part is implemented by an embedded FPGA unit, while for the coarse-grain reconfigurable hardware our developed high-performance coarse-grain data-path is used. The design flow mainly consists of three steps; the analysis procedure, the mapping onto coarse-grain blocks, and the mapping onto the fine-grain hardware. In this work, the methodology is validated using five real-life applications; an OFDM transmitter, a medical imaging technique, a wavelet-based image compressor, a video compression scheme and a JPEG encoder. The experimental results show that the speedup, relative to an all-FPGA solution, ranges from 1.55 to 4.17 for the considered applications. 相似文献

6.

Exploiting Application Data-Parallelism on Dynamically Reconfigurable Architectures: Placement and Architectural Considerations

Banerjee S. Bozorgzadeh E. Dutt N. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(2):234-247

Partial dynamic reconfiguration, often called run-time reconfiguration (RTR), is a key feature in modern reconfigurable platforms. In this paper, we present parallelism granularity selection (PARLGRAN), an application mapping approach that maximizes performance of application task chains on architectures with such capability. PARLGRAN essentially selects a suitable granularity of data-parallelism for individual data parallel tasks while considering key issues such as significant reconfiguration overhead and placement constraints. It integrates granularity selection very effectively in a joint scheduling and placement formulation, necessary due to constraints imposed by partial RTR. As a key step to validating PARLGRAN, we additionally present an exact strategy (integer linear programming formulation). We demonstrate that PARLGRAN generates high-quality schedules with: (1) a set of small test cases where we compare our results with the exact strategy; (2) a very large set of synthetic experiments with over a thousand data-points where we compare it with a simpler strategy that tries to statically maximize data-parallelism, i.e., only considers resource availability; and (3) a detailed application case study of JPEG encoding. The application case-study confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism. Last, but very important, we demonstrate that our approach is well-suited for true on-demand computing with detailed execution time estimates on a typical embedded processor. Heuristic execution time is comparable to task execution time, i.e., it is feasible to integrate PARLGRAN in a run-time scheduler for dynamically reconfigurable architectures. 相似文献

7.

Run-time Task Overlapping on Multiprocessor Platforms

Zhe Ma Francky Catthoor 《Journal of Signal Processing Systems》2010,60(2):169-182

Today’s embedded applications often consist of multiple concurrent tasks. These tasks are decomposed into sub-tasks which are in turn assigned and scheduled on multiple different processors to achieve the Pareto-optimal performance/energy combinations. Previous work introduced systematical approaches to make performance-energy trade-offs explorations for each individual task and used the exploration results at run-time to fulfill system-level constraints. However, they did not exploit the fact that the concurrent tasks can be executed in an overlapped fashion. In this paper, we propose a simple yet powerful on-line technique that performs task overlapping by run-time subtask re-scheduling. By doing so, a multiprocessor system with concurrent tasks can achieve better performance without extra energy consumption. We have applied our algorithm to a set of randomly-generated task graphs, obtaining encouraging improvements over non-overlapped task, and also having less overall energy consumption than a previous DVS method for real-time tasks. Then, we have demonstrated the algorithm on real-life video- and image-processing applications implemented on a dual-processor TI TMS320C6202 board: We have achieved a reduction of 22–29% in the application execution time, while the impact of run-time scheduling overhead proved to be negligible (1.55%). 相似文献

8.

Hardware/software codesign: a systematic approach targeting data-intensive applications

Wiangtong T. Cheung P.Y.K. Luk W. 《Signal Processing Magazine, IEEE》2005,22(3):14-22

This article presents a systematic approach to hardware/software codesign targeting data-intensive applications. It focuses on the application processes that can be represented in directed acrylic graphs (DAGs) and use a synchronous dataflow (SDF) model, the popular form of dataflow employed in DSP systems when running the process. The codesign system is based on the ultrasonic reconfigurable platform, a system designed jointly at Imperial College and the SONY Broadcast Laboratory. This system is modeled as a loosely coupled structure consisting of a single instruction processor and multiple reconfigurable hardware elements. The paper also introduces and demonstrates a task-based hardware/software codesign environment specialized for real-time video applications. Both the automated partitioning and scheduling environment and the task manager program help to provide a fast robust for supporting demanding applications in the codesign system. 相似文献

9.

The Agamid design-space exploration framework

Daniel Gregorek Alberto Garcia-Ortiz 《Design Automation for Embedded Systems》2018,22(4):293-314

The emergence of many-core processors raises novel demands to system design. Power-limitations and abundant parallelism require for efficient and scalable run-time management. The integration of dedicated hardware to enhance the performance of the run-time management system is gaining an increasing importance. But the design of a run-time manager for many-core generally suffers from exhaustive evaluation time. Previous works do not address for the required flexibility or do not address for reasonable evaluation time of the simulation framework. We propose the novel simulation framework Agamid to foster the development and evaluation of hardware enhanced run-time management for many-core. Our transaction-level framework performs design point evaluation of hardware enhanced run-time management for many-core at the timescale of seconds. We use a hybrid simulation approach considering the run-time management and the user application at different levels of abstraction. The framework provides a generic run-time manager to compare arbitrary management systems and HW/SW partitionings. The implementation of the run-time manager facilitates direct execution at the host machine and a detailed synchronization model. Agamid applies user application workloads by means of transaction-based task graphs. An extendable system-call interface allows arbitrary interaction between the user application and the run-time management system. The thorough calibration of the RTM timing model enables reasonable approximations of the management overhead. Our evaluation considers the accuracy, wall-time and design space exploration capabilities of Agamid. Our findings substantiate the usefulness to integrate the modeling of the run-time management, hardware architecture and user application into a single transaction-level framework. 相似文献

10.

A temporal bipartitioning algorithm for dynamically reconfigurableFPGAs

Canto E. Moreno J.M. Cabestany J. Lacadena I. Insenser J.M. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(1):210-218

This paper will describe a systematic method to map synchronous digital systems into dynamically reconfigurable programmable logic (i.e., programmable logic able to swap in real time the configuration defining the functionality of the system). The method is based on a temporal bipartitioning technique that is able to separate the static implementation of a circuit in two temporal independence hardware contexts. As the experimental results show, the method is capable of improving the functional density of the dynamic implementation with respect to the static one 相似文献

11.

A dynamic partial reconfigurable system with combined task allocation method to improve the reliability of FPGA

《Microelectronics Reliability》2018

Currently most FPGAs use SRAM-based technology, which are susceptible to faults from external electromagnetic radiation or produced by long-time internal overload operation. The dynamic partial reconfigurable (DPR) system, as an emerging technology, provides a promising way to solve this problem by reallocating the tasks in damaged resource areas to non-faulty regions at runtime. Based on such idea, an infrastructure for coordinately executing specialized hardware tasks on a reconfigurable FPGA is presented to achieve the flexibility for tolerating the occurring faults at runtime. Moreover, a method named MER-3D-Contact that combines the maximum empty rectangles (MER) technique with the adjacency heuristic is proposed to allocate tasks in the dynamical partial reconfiguration system for higher resource utilization, higher task acceptance ratio and lower fragmentation ratio. At last, experiments are carried out to evaluate the performance of the proposed system, results show that the proposed system can make the highest improvement 36% without damaged areas and the highest improvement 58% with damaged resources in terms of task acceptance ratio. Thus, the proposed system is expected a wide application in the field of more reliable FPGAs. 相似文献

12.

On the hard-real-time scheduling of embedded streaming applications

Mohamed A. Bamakhrama Todor P. Stefanov 《Design Automation for Embedded Systems》2013,17(2):221-249

In this paper, we consider the problem of hard-real-time (HRT) multiprocessor scheduling of embedded streaming applications modeled as acyclic dataflow graphs. Most of the hard-real-time scheduling theory for multiprocessor systems assumes independent periodic or sporadic tasks. Such a simple task model is not directly applicable to dataflow graphs, where nodes represent actors (i.e., tasks) and edges represent data-dependencies. The actors in such graphs have data-dependency constraints and do not necessarily conform to the periodic or sporadic task models. In this work, we prove that the actors in acyclic Cyclo-Static Dataflow (CSDF) graphs can be scheduled as periodic tasks. Moreover, we provide a framework for computing the periodic task parameters (i.e., period and start time) of each actor, and handling sporadic input streams. Furthermore, we define formally a class of CSDF graphs called matched input/output (I/O) rates graphs which represents more than 80 % of streaming applications. We prove that strictly periodic scheduling is capable of achieving the maximum achievable throughput of an application for matched I/O rates graphs. Therefore, hard-real-time schedulability analysis can be used to determine the minimum number of processors needed to schedule matched I/O rates applications while delivering the maximum achievable throughput. This can be of great use for system designers during the Design Space Exploration (DSE) phase. 相似文献

13.

Improving functional density using run-time circuit reconfiguration[FPGAs]

Wirthlin M.J. Hutchings B.L. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1998,6(2):247-256

The ability to provide flexibility and allow fine-grain circuit specialization make field programmable gate arrays (FPGA's) ideal candidates for computing elements within application-specific architectures. The benefits of gate-level specialization and reconfigurability can be extended by reconfiguring circuit resources at run-time. This technique, termed run-time reconfiguration (RTR), allows the exploitation of dynamic conditions or temporal locality within application-specific problems. For several applications, this technique has been shown to reduce the hardware resources required for computation. The use of this technique on conventional FPGA's, however, requires additional time for circuit reconfiguration. A functional density metric is introduced that balances the advantages of RTR against its associated reconfiguration costs. This metric is used to justify run-time reconfiguration against other more conventional approaches. Several run-time reconfigured applications are presented and analyzed using this approach 相似文献

14.

Run-time management of systems with partially reconfigurable FPGAs

《Integration, the VLSI Journal》2017

Partial reconfiguration (PR) of FPGAs can be used to dynamically extend and adapt the functionality of computing systems by swapping in and out HW tasks. To coordinate the on-demand task execution, we propose and implement a Run-Time System Manager (RTSM) for scheduling software (SW) tasks on available processor(s) and hardware (HW) tasks on any number of reconfigurable regions (RRs) of a partially reconfigurable FPGA. Fed with the initial partitioning of the application into tasks, the corresponding task graph, and the available task mappings, the RTSM controls system operation considering the status of each task and region (e.g. busy, idle, scheduled for reconfiguration/execution, etc). Our RTSM supports task reuse and configuration prefetching to minimize reconfigurations, task movement among regions to efficiently manage the FPGA area, and region reservation for future reconfiguration and execution. We validate the correctness and portability of our RTSM executing an image processing application on two Xilinx-based platforms: ZedBoard and XUPV5. We also perform a more extensive evaluation of its features using a simulation framework, and find that – despite the technology limitations – our approach can give promising results in terms of scheduling quality. Since our RTSM supports also the scheduling of parallel SW tasks, we use it to manage the execution of the entire parallel Edge Detection application on a desktop; we compare the application execution time with that using the OpenMP framework and find that with our RTSM execution is 2.4 times faster than the unoptimized OpenMP version. When processor affinity optimization is enabled for OpenMP, our RTMS and the OpenMP are on par, indicating that the scheduling efficiency of our RTSM is competitive to this state-of-the-art scheduler, while supporting in addition the management of HW tasks. 相似文献

15.

基于预配置和配置重用的粗粒度动态可重构系统任务调度技术

戴紫彬曲彤洲《电子与信息学报》2019,41(6):1458-1465

配置时间过长是制约可重构系统整体性能提升的重要因素,而合理的任务调度技术可有效降低系统配置时间。该文针对粗粒度动态可重构系统(CGDRS)和具有数据依赖关系的流应用,提出了一种3维任务调度模型。首先基于该模型,设计了一种基于预配置策略的任务调度算法(CPSA);然后根据任务间的配置重用性,提出了间隔配置重用与连续配置重用策略,并据此对CPSA算法进行改进。实验结果证明,CPSA算法能够有效解决调度死锁问题、降低流应用执行时间并提高调度成功率。与其它调度算法相比,对流应用执行时间的平均优化比例达到6.13%～19.53%。相似文献

16.

A method for partitioning applications in hybrid reconfigurable architectures

Michalis?D.?Galanis Email author Athanasios?Milidonis George?Theodoridis Dimitrios?Soudris Costas?E.?Goutis 《Design Automation for Embedded Systems》2005,10(1):27-47

In this paper, we propose a methodology for accelerating application segments by partitioning them between reconfigurable hardware blocks of different granularity. Critical parts are speeded-up on the coarse-grain reconfigurable hardware for meeting the timing requirements of application code mapped on the reconfigurable logic. The reconfigurable processing units are embedded in a generic hybrid system architecture which can model a large number of existing heterogeneous reconfigurable platforms. The fine-grain reconfigurable logic is realized by an FPGA unit, while the coarse-grain reconfigurable hardware by our developed high-performance data-path. The methodology mainly consists of three stages; the analysis, the mapping of the application parts onto fine and coarse-grain reconfigurable hardware, and the partitioning engine. A prototype software framework realizes the partitioning flow. In this work, the methodology is validated using five real-life applications. Analytical partitioning experiments show that the speedup relative to the all-FPGA mapping solution ranges from 1.5 to 4.0, while the specified timing constraints are satisfied for all the applications. 相似文献

17.

一种可重构阵列结构及其任务调度算法

郭力曹超《信息技术》2011,(5):68-72

提出了一种可以利用计算时间覆盖配置时间和数据传输时间的可重构阵列结构,并且针对该可重构阵列结构提出了一种表调度算法进行任务调度.在SOCDesigner平台上进行了软硬件协同仿真,对于IDCT,FFT,4×4矩阵乘法新可重构阵列相比原来的可重构阵列有平均约10%的速度提升. 相似文献

18.

基于MrsP协议的任务划分优化算法

张海涛张通张宇辉管银凤张凤登《电子科技》2023,36(3):36-41

多处理器实时系统中,调度和资源共享是核心问题,与之相对应的调度算法和共享资源访问协议将直接影响系统的性能,这就要求调度算法和资源访问协议在保证实时性的基础上尽量发挥硬件平台的计算能力。然而,现有的调度算法多假设任务相互独立,没有考虑任务之间的资源共享,共享资源访问协议也多侧重于规则和最坏响应时间分析。对此,将P-RM算法和MrsP协议相结合,得出了多处理器实时系统的整体可调度性条件。文中根据MrsP协议的特性,提出了一种减小阻塞时间的任务划分算法,通过改进任务利用率的计算方式解决了关键区重复计算的问题,与之前的任务划分算法相比,也解决了关键区重复计算以及任务分类后拆分再分配的问题。实验表明,该算法所需要的处理器数目减少了15%~20%。相似文献

19.

Task Scheduling for Context Minimization in Dynamically Reconfigurable Platforms

Nei-Chiung Perng Shih-Hao Hung Chia-Heng Tu 《Journal of Signal Processing Systems》2010,59(1):3-12

Dynamically reconfigurable hardware provides useful means to reduce the time-to-prototype and even the time-to-market in product designs. It also offers a good alternative in reconfiguring hardware logics to optimize the system performance. This paper targets an essential issue in reconfigurable computing, i.e., the minimization of configuration contexts. We explore different constraints on the CONTEXT MINIMIZATION problem. When the resulting subproblems are polynomial-time solvable, optimal algorithms are presented. We also present a greedy algorithm for the CONTEXT MINIMIZATION problem, that is proved NP\mathcal{NP}-complete. The capability of the proposed algorithm is evaluated by a series of experiments. 相似文献

20.

Dynamic and Partial FPGA Exploitation 总被引：1，自引：0，他引：1

Becker J. Hubner M. Hettich G. Constapel R. Eisenmann J. Luka J. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2007,95(2):438-452

Today's field programmable gate array (FPGA) architectures, like Xilinx's Virtex-II series, enable partial and dynamic run-time self-reconfiguration. This feature allows the substitution of parts of a hardware design implemented on this reconfigurable hardware, and therefore, a system can be adapted to the actual demands of applications running on the chip. Exploiting this possibility enables the development of adaptive hardware for a huge variety of applications. A novel method for communication interfaces using look up table (LUT)-based communication primitives enables an exact separation of reconfigurable parts and a fast and intelligent bus-system. A new adaptive software/hardware reconfigurable system is presented in this paper, using a real application in the automotive domain implemented on a Xilinx Virtex-II 3000 FPGA to present results 相似文献