期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Run-Time Management of a MPSoC Containing FPGA Fabric Tiles

Nollet V. Avasare P. Eeckhaut H. Verkest D. Corporaal H. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(1):24-33

Multimedia applications, like, e.g., 3-D games and video decoders, are typically composed of communicating tasks. Their target embedded computing platforms (e.g., TI OMAP3, IBM Cell) contain multiple heterogeneous processing elements. At application design-time, it is often unknown which applications will execute simultaneously. Hence, resource assignment decisions need to be made by a run-time manager. Run-time assignment of these communicating tasks onto the communication and computation resources of such a multiprocessor platform is a challenging task. In the presence of fine-grain reconfigurable hardware processing elements, the run-time manager also needs to consider the creation of a so-called configuration hierarchy. Instead of executing a dedicated hardware task, the fine-grain reconfigurable hardware fabric hosts a programmable softcore block that, in turn, executes the task functionality. Hence, the next challenge for run-time management is to efficiently handle a configuration hierarchy. This paper details a run-time task assignment heuristic that performs fast and efficient task assignment in a multiprocessor system-on-chip containing fine-grain reconfigurable hardware tiles. In addition, this algorithm is capable of managing a configuration hierarchy. We show that being capable of handling a configuration hierarchy significantly improves the task assignment performance (i.e., success rate and assignment quality). In several cases, adding a configuration hierarchy improves the assignment success rate of the assignment heuristic by 20%. 相似文献

2.

Prototyping Multiprocessor System-on-Chip Applications: A Platform-Based Approach

Senouci Benaoumeur Bouchhima Aimen Rousseau Frédéric Petrot Frédéric Jerraya Ahmed 《Distributed Systems Online, IEEE》2007,8(5):2-2

Multiprocessor system-on-chip designers face challenges during prototyping, when there is a real need for methods and tools that can easily map applications onto different architectures without tedious redesigning. Such methods and tools must also ensure rapid validation. A new MPSoC prototyping and validation approach uses the Posix 1003 1.C API standard and a reconfigurable multiprocessor hardware platform for fast prototyping of Posix-based applications. 相似文献

3.

Systematic MIMO OFDM transceiver implementation for MPSoCs: a nucleus based approach

D. Guenther T. Kempf A. Ishaque G. Ascheid 《Analog Integrated Circuits and Signal Processing》2012,73(2):597-612

In this paper, we analyze the potential as well as the limitations of multiprocessor system-on-chip (MPSoC) platforms when implementing software defined radio (SDR) applications for wireless communications. Suitable MPSoCs contain a potentially heterogeneous multi-core computing cluster and can be further equipped with application specific accelerators. The physical layer of a MIMO OFDM transceiver, for which the IEEE 802.11n standard serves as reference, is investigated in this work. To maintain portability, the platform independent algorithmic kernels (Nuclei) are identified first. In the following case study, efficient implementations (Flavors) of these Nuclei are implemented on an MPSoC platform. Resultant algorithmic performance (e.g., frame-error-rate) as well as the system performance (e.g., latency and throughput) are discussed. 相似文献

4.

Deterministic reversible MPSoC debugger based on virtual platform execution traces

Marcos?Aurélio?Pinto?Cunha Email author Nicolas?Fournel Frédéric?Pétrot 《Design Automation for Embedded Systems》2016,20(1):47-63

The increasing complexity of multiprocessor system on chip (MPSoC) makes the software developers life harder when chasing bugs. The debugging process is particularly tedious as it involves analyzing parallel execution flows. Executing a program many times is an integral part of the process in conventional debugging, but the non-determinism due to parallel execution often leads to different execution paths and different behaviors. In this paper, we propose an approach based on simulation, as it is nowadays an integral part of the MPSoC design flow, to ease pin-pointing bugs in a parallel execution. To that aim, we collect traces using a virtual platform, and when an execution fails, re-execute the traces, in either forward or reverse direction. We define a trace model suitable for this task, and detail a strategy for providing forward and reverse execution features to avoid long simulation times during a debug session. We demonstrate experimentally that re-execution is a deterministic process which, when debugging using the usual trial and error developer approach, is much faster than simulation. 相似文献

5.

Software Pipeline–Based Partitioning Method with Trade‐Off between Workload Balance and Communication Optimization

下载免费PDF全文

Kai Huang Siwen Xiu Min Yu Xiaomeng Zhang Rongjie Yan Xiaolang Yan Zhili Liu 《ETRI Journal》2015,37(3):562-572

For a multiprocessor System‐on‐Chip (MPSoC) to achieve high performance via parallelism, we must consider how to partition a given application into different components and map the components onto multiple processors. In this paper, we propose a software pipeline–based partitioning method with cyclic dependent task management and communication optimization. During task partitioning, simultaneously considering computation load balance and communication optimization can cause interference, which leads to performance loss. To address this issue, we formulate their constraints and apply an integer linear programming approach to find an optimal partitioning result — one that requires a trade‐off between these two factors. Experimental results on a reconfigurable MPSoC platform demonstrate the effectiveness of the proposed method, with 20% to 40% performance improvements compared to a traditional software pipeline–based partitioning method. 相似文献

6.

Performance Analysis of Arbitration Policies for SoC Communication Architectures

Francesco Poletti Davide Bertozzi Luca Benini Alessandro Bogliolo 《Design Automation for Embedded Systems》2003,8(2-3):189-210

As technology scales toward deep submicron, the integration of a large number of IP blocks on the same silicon die is becoming technically feasible, thus enabling large-scale parallel computations, such as those required for multimedia workloads. The communication architecture is becoming the bottleneck for these multiprocessor Systems-on-Chip (SoC), and efficient contention resolution schemes for managing simultaneous access requests to the shared communication resources are required to prevent system performance degradation. The contribution of this work is to analyze the impact on multiprocessor SoC performance of different bus arbitration policies under different communication patterns, showing the distinctive features of each policy and the strong correlation of their effectiveness with the communication requirements of the applications. Beyond traditional arbitration schemes such as round robin and TDMA, another policy is considered that periodically allocates a temporal slot for contention-free bus utilization to a processor which needs fixed predictable bandwidth for the correct execution of its time-critical task. The results are derived on a complete and scalable multiprocessor SoC simulation platform based on SystemC, whose software support includes a complete embedded multiprocessor OS (RTEMS). The communication architecture is AMBA compliant, and we exploit the flexibility of this multi-master commercial standard, which does not specify the arbitration algorithm, to implement the explored contention resolution schemes. 相似文献

7.

Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia

Paulin P.G. Pilkington C. Langevin M. Bensoudane E. Lyonnard D. Benny O. Lavigueur B. Lo D. Beltrame G. Gagne V. Nicolescu G. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(7):667-680

The MultiFlex system is an application-to-platform mapping tool that integrates heterogeneous parallel components-H/W or S/W- into a homogeneous platform programming environment. This leads to higher quality designs through encapsulation and abstraction. Two high-level parallel programming models are supported by the following MultiFlex platform mapping tools: a distributed system object component (DSOC) object-oriented message passing model and a symmetrical multiprocessing (SMP) model using shared memory. We demonstrate the combined use of the MultiFlex multiprocessor mapping tools, supported by high-speed hardware-assisted messaging, context-switching, and dynamic scheduling using the StepNP demonstrator multiprocessor system-on-chip platform, for two representative applications: 1) an Internet traffic management application running at 2.5 Gb/s and 2) an MPEG4 video encoder (VGA resolution, at 30 frames/s). For these applications, a combination of the DSOC and SMP programming models were used in interoperable fashion. After optimization and mapping, processor utilization rates of 85%-91% were demonstrated for the traffic manager. For the MPEG4 decoder, the average processor utilization was 88%. 相似文献

8.

Multiprocessor SoC design methods and tools

Park H.-W. Oh H. Ha S. 《Signal Processing Magazine, IEEE》2009,26(6):72-79

With the continuous evolution of semiconductor process technology, it is now possible to integrate tens or hundreds of processors in a single chip and make an multiprocessor systems-on-chip (MPSoC), or a multicore platform. There are many dual or quad-core CPUs and 100+-core graphics processing units (GPUs) on the desktop computer market, and many MPSoC solutions are also in the embedded computing markets. A key benefit of multicore platforms is scalability in performance and power. 相似文献

9.

Run-time Task Overlapping on Multiprocessor Platforms

Zhe Ma Francky Catthoor 《Journal of Signal Processing Systems》2010,60(2):169-182

Today’s embedded applications often consist of multiple concurrent tasks. These tasks are decomposed into sub-tasks which are in turn assigned and scheduled on multiple different processors to achieve the Pareto-optimal performance/energy combinations. Previous work introduced systematical approaches to make performance-energy trade-offs explorations for each individual task and used the exploration results at run-time to fulfill system-level constraints. However, they did not exploit the fact that the concurrent tasks can be executed in an overlapped fashion. In this paper, we propose a simple yet powerful on-line technique that performs task overlapping by run-time subtask re-scheduling. By doing so, a multiprocessor system with concurrent tasks can achieve better performance without extra energy consumption. We have applied our algorithm to a set of randomly-generated task graphs, obtaining encouraging improvements over non-overlapped task, and also having less overall energy consumption than a previous DVS method for real-time tasks. Then, we have demonstrated the algorithm on real-life video- and image-processing applications implemented on a dual-processor TI TMS320C6202 board: We have achieved a reduction of 22–29% in the application execution time, while the impact of run-time scheduling overhead proved to be negligible (1.55%). 相似文献

10.

Multi-Processor SoC-Based Design Methodologies Using Configurable and Extensible Processors

Grant Martin 《Journal of Signal Processing Systems》2008,53(1-2):113-127

The growing interest in multiprocessor system-on-chip (MPSoC) design, or ‘multicore’ processors, has resulted in some confusion between the various types of multiprocessor architectures and their suitability in different application spaces. In particular, there are clear differences between the general-purpose, symmetric multiprocessor (SMP) approaches, and the application-specific, asymmetric multiprocessor (AMP) architectures. Configurable and extensible processors are especially suited for the AMP approach, yet their flexibility means that new design methodologies and tools must be developed to allow effective utilisation of multiple instruction-set processors in a complex design. Configurable and extensible processors are especially well suited for data-intensive computational tasks, such as are found in many signal and image processing applications, including audio, video, and wireless and wired networking. A design methodology for such applications must pay careful attention to the right programming models, and dataflow styles of processing seem a natural fit to the application space. In this paper, we describe a design methodology, flow and tools for MPSoC design using configurable and extensible processors that is especially interesting for data-intensive dataflow style applications. Some of the issues involved in this design approach are used to highlight opportunities for ongoing research. 相似文献

11.

Application-Specific MPSoC Reliability Optimization

Zhenyu Gu Changyun Zhu Li Shang Dick R.P. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(5):603-608

This paper presents modeling and estimation techniques permitting the temperature-aware optimization of application-specific multiprocessor system-on-chip (MPSoC) reliability. Technology scaling and increasing power densities make MPSoC lifetime reliability problems more severe. MPSoC reliability strongly depends on system-level MPSoC architecture, redundancy, and thermal profile during operation. We propose an efficient temperature-aware MPSoC reliability analysis and prediction technique that enables MPSoC reliability optimization via redundancy and temperature-aware design planning. Reliability, performance, and area are concurrently optimized. Simulation results indicate that the proposed approach has the potential to substantially improve MPSoC system mean time to failure with small area overhead. 相似文献

12.

System-level design optimization of reliable and low power multiprocessor system-on-chip

Rishad A. Shafik Bashir M. Al-Hashimi Jeff S. Reeve 《Microelectronics Reliability》2012,52(8):1735-1748

In this paper, we study the impact of application task mapping on the reliability of multiprocessor system-on-chip (MPSoC) application in the presence of soft errors. Based on this study, we propose a novel system-level design optimization of an MPSoC application through joint power minimization and reliability improvement. The power minimization is carried out using voltage scaling technique, while reliability improvement is achieved through careful choice of application task mapping on the homogeneous MPSoC processing cores. The overall aim is to minimize the number of single-event upsets (SEUs) experienced by the MPSoC application for suitably identified voltage scaling of the system processing cores such that the power is reduced and the specified real-time constraint is met. We evaluate the effectiveness of the proposed design optimization using a number of different applications, including MPEG-2 video decoder and synthetic applications. We show that for an MPEG-2 decoder with four processing cores, the proposed soft error-aware optimization produces a design with 38% less SEUs than soft error-unaware design optimization for an arbitrary soft error rate of 10^?9, while consuming 9% less power and meeting a given real-time constraint. Furthermore, we investigate the impact of architecture allocation (allocation of processing cores) and show that for an MPSoC with six processing cores and a given real-time constraint, the proposed optimization produces design with up to 7% less SEUs compared to soft error-unaware designs at the cost of 5.5% higher power. 相似文献

13.

A hierarchical run-time adaptive resource allocation framework for large-scale MPSoC systems

Wei?Quan Email author Andy?D.?Pimentel 《Design Automation for Embedded Systems》2016,20(4):311-339

In the embedded computer system domain, MPSoC systems have become increasingly popular due to the ever-increasing performance demands of modern embedded applications. The number of processing elements in these MPSoCs also steadily increases. Whereas current MPSoCs still contain a limited number of processing elements, future MPSoCs will feature tens up to hundreds of (heterogeneous) processing elements that are all integrated on a single chip. On these future large-scale MPSoC systems, the mapping of applications onto the hardware resources plays an important role to fully explore the parallelism of applications. In this article, a hierarchical run-time adaptive resource allocation framework which uses an intelligent task remapping approach is proposed to improve the system performance for large-scale MPSoCs. 相似文献

14.

The Agamid design-space exploration framework

Daniel Gregorek Alberto Garcia-Ortiz 《Design Automation for Embedded Systems》2018,22(4):293-314

The emergence of many-core processors raises novel demands to system design. Power-limitations and abundant parallelism require for efficient and scalable run-time management. The integration of dedicated hardware to enhance the performance of the run-time management system is gaining an increasing importance. But the design of a run-time manager for many-core generally suffers from exhaustive evaluation time. Previous works do not address for the required flexibility or do not address for reasonable evaluation time of the simulation framework. We propose the novel simulation framework Agamid to foster the development and evaluation of hardware enhanced run-time management for many-core. Our transaction-level framework performs design point evaluation of hardware enhanced run-time management for many-core at the timescale of seconds. We use a hybrid simulation approach considering the run-time management and the user application at different levels of abstraction. The framework provides a generic run-time manager to compare arbitrary management systems and HW/SW partitionings. The implementation of the run-time manager facilitates direct execution at the host machine and a detailed synchronization model. Agamid applies user application workloads by means of transaction-based task graphs. An extendable system-call interface allows arbitrary interaction between the user application and the run-time management system. The thorough calibration of the RTM timing model enables reasonable approximations of the management overhead. Our evaluation considers the accuracy, wall-time and design space exploration capabilities of Agamid. Our findings substantiate the usefulness to integrate the modeling of the run-time management, hardware architecture and user application into a single transaction-level framework. 相似文献

15.

ARTS: A SystemC-based framework for multiprocessor Systems-on-Chip modelling

Shankar Mahadevan Kashif Virk Jan Madsen 《Design Automation for Embedded Systems》2007,11(4):285-311

One of the challenges of designing a heterogeneous multiprocessor SoC is to find the right partitioning of the application for the target platform architecture. The right partitioning is dependent on the characteristics of the processors and the network connecting them as well as the application. We present an abstract system-level modelling and simulation framework (ARTS) which allows for cross-layer modelling and analysis covering the application layer, middleware layer, and hardware layer. ARTS allows MPSoC designers to explore and analyze the network performance under different traffic and load conditions, consequences of different task mappings to processors (software or hardware) including memory and power usage, and effects of RTOS selection, including scheduling, synchronization and resource allocation policies. We present the application and platform models of ARTS as well as their implementation in SystemC. We present the usage of the ARTS framework as seen from platform developers’ point of view, where new components may be created and integrated into the framework, and from application designers’ point of view, where existing components are used to explore possible implementations. The latter is illustrated through a case study of a real-time, smart phone application consisting of 5 applications with a total of 114 tasks mapped onto different platforms. Finally, we discuss the simulation performance of the ARTS framework in relation to scalability. This work has been partially funded by ARTIST2 (IST-004527). 相似文献

16.

Implementation of W-CDMA Cell Search on a Highly Parallel and Scalable MPSoC

Roberto Airoldi Tapani Ahonen Fabio Garzia Dragomir Milojevic Jari Nurmi 《Journal of Signal Processing Systems》2011,64(1):137-148

The performance of the W-CDMA cell search algorithm can be significantly improved using homogeneous general purpose Multi-Processor System-on-Chip (MPSoC) architectures. The application also scales well, as the number of processing nodes increases, allowing practical accelerations to become close to the theoretical maximum. In this work we describe a template MPSoC architecture based on multiprocessor computational clusters, called Ninesilica. Each Ninesilica consist of nine processing nodes based on COFFEE RISC architecture. MPSoC inter- and intra-cluster communication are enabled using hierarchical Network-on-Chip with dedicated point to point and broadcast communication services for better performance. Proposed template has been used to instantiate complete systems with one and four Ninesilica clusters, resulting in MPSoCs with respectively 9 and 36 computational nodes. The MPSoCs have been physically prototyped on a FPGA device, and the W-CDMA cell search algorithm has been mapped on both MPSoC platforms. The four Ninesilica MPSoC can execute W-CDMA in 20.5 ms (at 115 MHz, slow mode implementation) with the total speed-up of 24.3X and 3.3X when compared to a single processing core system and to a single Ninesilica cluster, respectively. 相似文献

17.

Quantitative Analysis of FPGA-based Database Searching

N. Shirazi D. Benyamin W. Luk P.Y.K. Cheung S. Guo 《The Journal of VLSI Signal Processing》2001,28(1-2):85-96

This paper reports two contributions to the theory and practice of using reconfigurable hardware to implement search engines based on hashing techniques. The first contribution concerns technology-independent optimisations involving run-time reconfiguration of the hash functions; a quantitative framework is developed for estimating design trade-offs, such as the amount of temporary storage versus reconfiguration time. The second contribution concerns methods for optimising implementations in Xilinx FPGA technology, which achieve different trade-offs in cell utilisation, reconfiguration time and critical path delay; quantitative analysis of these trade-offs are provided. 相似文献

18.

Evaluating object DBMSs for multimedia

Pazandak P. Srivastava J. 《Multimedia, IEEE》1997,4(3):34-49

We describe functionality for determining an object database management system's suitability for developing multimedia applications. We discuss all levels of hardware and software support, as even the most ideal database software cannot operate independent of operating systems, networks and hardware. A review of the multimedia support provided by current commercial and research object database management systems is also included 相似文献

19.

Towards a better-than-best-effort forwarding service for multimediaflows

Jeffay K. 《Multimedia, IEEE》1999,6(4):84-87

A salient requirement of interactive multimedia applications is that they transmit data continuously at uniform rates with minimum possible end-to-end delay. The majority of these applications do not require hard and fast guarantees of network performance, but the current best-effort forwarding model of the Internet is frequently insufficient for realizing these requirements. Worse still, the requirement of uniform-rate transmission puts many multimedia applications at odds with current and proposed Internet network management practices that assume or require TCP-like reactions to packet loss. We are investigating router-based active queue management, specifically the use of queue occupancy thresholds to isolate TCP flows and to provide a better-than-best-effort forwarding service for flows in need of uniform-rate transmissions. Our current scheme, class-based thresholds (CBT), relies on a packet marking mechanism such as those proposed for realizing differentiated services on the Internet. CBT, when combined with existing active router queue management schemes such as random early detection (RED), provides a performance for TCP that approximates that achievable under a packet scheduling scheme and acceptable performance for multimedia flows. CBT is a simple and efficient mechanism with implementation complexity and run-time overhead comparable to that of RED 相似文献

20.

Cognitive Radio Design on an MPSoC Reconfigurable Platform

Qiwei Zhang André B. J. Kokkeler Gerard J. M. Smit 《Mobile Networks and Applications》2008,13(5):424-430

Cognitive Radio has been proposed as a promising technology for solving today’s spectrum scarcity problem by means of dynamic spectrum access. The multiprocessor system-on-chip (MPSoC) reconfigurable platform is proposed as an enabling technology for cognitive radio. In this paper, we propose a design methodology based on task transaction level interface for the design of cognitive radio baseband on an MPSoC reconfigurable platform. The reconfiguration of a novel, low-complexity fast Fourier transform for orthogonal frequency-division multiplexing based Cognitive Radio is used as a design case to show the effectiveness of the methodology for modelling the dynamic behavior of Cognitive Radio and facilitating the platform implementation.

Qiwei ZhangEmail:

相似文献