首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The increasing density of silicon processes, coupled with the development of ever more energy and space efficient embedded core designs, has led to multi-processor system-on-chip (MPSoC) designs becoming increasingly attractive for use in embedded systems. Unfortunately this increase in core count gives rise to an explosion in design space possibilities, especially when heterogeneous designs are considered. To address this problem, new techniques in simulation are required to increase the simulation performance of these systems, while maintaining the accuracy needed to make good design decisions, and to verify the performance characteristics for real-time systems. We present a new high-speed, near cycle-accurate simulator, addressing an important but neglected category of multicore systems: deeply-embedded cache-incoherent MPSoCs. We take advantage of the unique properties of these systems to relax synchronisation constraints and increase the parallelism of the simulation. In doing so we achieve performance not possible using previous simulation techniques, without compromising the accuracy of the results. Quantitative performance results are presented across a large range of simulated MPSoC designs, comprising 1–64 cores, on average we simulate at 5.7 MIPS, with simulation speeds reaching 377 MIPS in the best case. Comparing against FPGA implementations we demonstrate that the simulator manages this with an average timing error of only 2.1%. Applying some of these techniques to coherent simulation enables even coherent 64-core designs to be simulated accurately at up to 2.2 MIPS.  相似文献   

2.
The ever-increasing performance demand of modern embedded applications drives the development of multi-processor system-on-chip (MPSoC) systems in the embedded domain. Today’s MPSoC-based products increasingly have to deal with multiple application execution scenarios which may change dynamically at run time. To improve the system performance, a state-of-the-art solution is to dynamically adapting the allocation of system resources at run time for each execution scenario based on pre-determined resource schemes that have been optimized at design time. However, such approaches will not work well for MPSoC systems that have a large number of execution scenarios and/or frequent run-time variations in execution scenario behavior. In this work, we therefore propose a scalable run-time self-adaptive framework for MPSoC systems that addresses these problems, thereby considerably improving the system efficiency.  相似文献   

3.
RPM enables rapid prototyping of different multiprocessor architectures. It uses hardware emulation for reliable design verification and performance evaluation. The major objective of the RPM project is to develop a common, configurable hardware platform to accurately emulate different MIMD systems with up to eight execution processors. Because emulation is orders of magnitude faster than simulation, an emulator can run problems with large data sets more representative of the workloads for which the target machine is designed. Because an emulation is closer to the target implementation than an abstracted simulation, it can accomplish more reliable performance evaluation and design verification. Finally, an emulator is a real computer with its own I/O; the code running on the emulator is not instrumented. As a result, the emulator looks exactly like the target machine (to the programmer) and can run several different workloads, including code from production compilers, operating systems, databases, and software utilities  相似文献   

4.
High demand 3-D scenes on embedded systems draw the developers’ attention to use the whole resources of current low-power processors and add dedicated hardware as a graphic accelerator unit to deal with real-time realistic scene rendering. Photon mapping, as one of the most powerful techniques to render highly realistic 3-D images by high amounts of floating-point operations, is very time-consuming. To use the advantages of multiprocessor systems to make 3-D scenes, parallel photon-mapping rendering on a homogeneous multiprocessor SoC (MPSoC) platform along with a mesh NoC by an adaptive wormhole routing method to communicate packets among cores is proposed in this paper. To make efficient use of the MPSoC platform to carry out photon-mapping rendering, many methods concerning the increase of load balancing, the efficient use of memory, and the decrease of communication cost to achieve a scalable application are explored in this paper. The resulting MPSoC platform is verified and evaluated by cycle-accurate simulations for different sizes of the mesh NoC. As expected, the proposed methods can obtain excellent load balancing and achieve a maximum of 44.3 times faster on an 8-by-8 MPSoC platform than on a single-core MPSoC platform.  相似文献   

5.
In order to meet the ever-increasing computing requirement in the embedded market, multiprocessor chips were proposed as the best way out. In this work we investigate the energy consumption in these embedded MPSoC systems. One of the efficient solutions to reduce the energy consumption is to reconfigure the cache memories. This approach was applied for one cache level/one processor architecture, but has not yet been investigated for multiprocessor architecture with two level caches. The main contribution of this paper is to explore two level caches (L1/L2) multiprocessor architecture by estimating the energy consumption. Using a simulation platform, we first built a multiprocessor architecture, and then we propose a new algorithm that tunes the two-level cache memory hierarchy (L1 and L2). The tuning caches approach is based on three parameters: cache size, line size, and associativity. To find the best cache configuration, the application is divided into several execution intervals. And then, for each interval, we generate the best cache configuration. Finally, the approach is validated using a set of open source benchmarks; Spec 2006, Splash-2, MediaBench and we discuss the performance in terms of speedup and energy reduction.  相似文献   

6.
Image-based control (IBC) systems are increasingly being used in various domains including autonomous driving. The key challenge in IBC is to deal with high computation demand while guaranteeing performance and safety requirements such as stability. While modern industrial heterogeneous platforms, such as NVIDIA Drive, offer the necessary compute power, application development on these platforms with performance and safety guarantees is still challenging. Alternative time-predictable platforms are not yet in widespread use.A typical design flow for IBC systems consists of three distinct elements: (i) mapping tasks onto platform resources; (ii) timing analysis, consisting of task-level worst-case execution time (WCET) analysis and application-level analysis to obtain worst-case performance bounds on aspects such as latency and throughput; (iii) controller design using the obtained performance bounds, ensuring performance and safety. While such a three-step design process is modular in nature, it usually leads to over-dimensioned systems with sub-optimal performance, because task- and/or application-level timing bounds are pessimistic.We present a scenario- and platform-aware design flow for IBC systems that exploits frequently occurring workload scenarios to optimize performance. For industrial platforms, where tight task-level WCET bounds are difficult to obtain, we moreover propose to use frequently occurring task execution times instead of WCET estimates to obtain tight application-level temporal bounds. During controller design, we then optimize performance and guarantee stability by identifying appropriate system scenarios and by designing a switched controller that switches between those scenarios. We illustrate our method considering a predictable multiprocessor system-on-chip platform - CompSOC. We validate the proposed method using hardware-in-the-loop (HiL) experiments with an industrial heterogeneous multiprocessor platform - NVIDIA Drive PX2 - considering a lane keeping assist system (LKAS). We obtain an improved control performance compared to state-of-the-art IBC design.  相似文献   

7.
Multiprocessor system-on-chip (MPSoC) designs offer a lot of computational power assembled in a compact design. In mobile robotic applications, they offer the chance to replace several dedicated computing boards by a single processor, which typically leads to a significant acceleration of the computer-vision algorithms employed. This enables robots to perform more complex tasks at lower power budgets, less cooling overhead and, ultimately, smaller physical dimensions.However, the presence of shared resources and dynamically varying load situations leads to low throughput and quality for corner detection; an algorithm very widely used in computer-vision. The contemporary operating systems from the domain have not been designed for the management of highly parallel but shared computing resources.In this paper, we evaluate resource-aware programming as a means to overcome these issues. Our work is based on Invasive Computing, a MPSoC hardware and operating-system design for resource-aware programming. We evaluate this system with real-world algorithms, like Harris and Shi–Tomasi corner detectors. Our results indicate that resource-aware programming can lead to significant improvements in the behavior of these detectors, with up to 22 percent improvement in throughput and up to 20 percent improvement in accuracy.  相似文献   

8.
A holistic design and verification environment to investigate driving assistance systems is presented, with an emphasis on system-on-chip architectures for video applications. Starting with an executable specification of a driving assistance application, subsequent transformations are performed across different levels of abstraction until the final implementation is achieved. The hardware/software partitioning is facilitated through the integration of OpenCV and SystemC in the same design environment, as well as OpenCV and Linux in the run-time system. We built a rapid prototyping, FPGA-based camera system, which allows designs to be explored and evaluated in realistic conditions. Using lane departure and the corresponding performance speedup, we show that our platform reduces the design time, while improving the verification efforts.  相似文献   

9.
Simulation models have been developed in order to foresee characteristics of networks, systems or protocols when carrying out tests in laboratories is very expensive or even impossible. This paper presents a simulation model of a multiprocessor network traffic analysis system. The model, which is based on closed networks of queues, evaluates the efficiency of the system depending on the hardware/software platform features. Therefore, this model is able to estimate performance early in the design and development stages simulating a multiprocessor architecture in charge of analysing network traffic. The goodness of the model will be checked by comparing analytical results with practical ones obtained in laboratory using a traffic analysis system that runs on a multiprocessor platform.  相似文献   

10.
This paper proposes a novel Colored Petri Net (CPN) based dynamic scheduling scheme, which aims at scheduling real-time tasks on multiprocessor system-on-chip (MPSoC) platforms. Our CPN based scheme addresses two key issues on task scheduling problems, dependence detecting and task dispatching. We model inter-task dependences using CPN, including true-dependences, output-dependences, anti-dependences and structural dependences. The dependences can be detected automatically during model execution. Additionally, the proposed model takes the checking of real-time constraints into consideration. We evaluated the scheduling scheme on the state-of-art FPGA based multiprocessor hardware system and modeled the system behavior using CPN tools. Simulations and state space analyses are conducted on the model. Experimental results demonstrate that our scheme can achieve 98.9% of the ideal speedup on a real FPGA based hardware prototype.  相似文献   

11.
Three-dimensional integrated circuits (3D ICs) are suitable alternatives to traditional two-dimensional (2D) ICs by leveraging its advantage of better performance and packaging; therefore, they have been highly considered by researchers. On the other hand, emerging network-on-chip (NoC) based many-core chips provides great potential for running multiple applications simultaneously. However, using this approach leads to the increase of the interference between applications, resulting in lowering the performance of each application. Hence, mapping tasks belonging to various applications onto the nodes of an architecture is a very important issue. In this study, based on partitioning concept, a novel methodology for mapping of multiple applications at run-time onto an irregular wireless 3D NoC-based multiprocessor system-on-chip (MPSoC) platform in which more than one task can be supported by each processing element (PE) was presented. In the second algorithm (enhanced irregular-partitioning best neighbor), according to the number of applications running simultaneously, the partitioning of network will be dynamically changed to minimize the communication overhead and congestion on the NoC that leads to more efficient task mapping. The simulation results reveal that the second proposed algorithm (enhanced IPBN) in comparison with NPBN (non-partitioning best neighbor) algorithm and our first proposed algorithm (basic IPBN) enhances the performance by decreasing the total execution time, average hop count, average channel load and energy consumption.  相似文献   

12.
During the past few years, embedded digital systems have been requested to provide a huge amount of processing power and functionality. A very likely foreseeable step to pursue this computational and flexibility trend is the generalization of on-chip multiprocessor platforms (MPSoC). In that context, choosing a programming model and providing optimized hardware support to it on these platforms is a challenging task. To deal in a portable way with MPSoCs having a different number of processors running possibly at different frequencies, work-stealing (WS) based parallelization is a current research trend.The contribution of this paper is to evaluate the impact of some simple MPSoCs’ architecture characteristics on the performance of WS in the MPSoC context. The previous evaluations of WS, either theoretical or experimental, were done on fixed multicores architectures. This work extends these studies by exploring the use of WS for the codesign of embedded applications on MPSoC platforms with different hardware capabilities, thanks to cycle-accurate measures.We firstly study the architectural choices suited to WS algorithms and measure the benefit of these architectural modifications. To assert whether WS is suited to the MPSoC context, we experimentally measure its intrinsic implementation overhead on the most efficient architectural designs. Finally, we validate the performances of the approach on two real applications: a regular multimedia application (temporal noise reduction) and an irregular computation intensive application (frames of the Mandelbrot set).Our results show that enhancing MPSoC platforms having up to 16 processors with widespread hardware support mechanisms can lead to important performance improvements at acceptable hardware cost for the considered applications.  相似文献   

13.
Integrating multicore heterogeneous systems into a system-in-package has challenged many design and test engineers. To overcome these obstacles, we need a common EDA tool for digital, analog, RF, and thermal designs. This article proposes a platform-centric design methodology for modern electronic systems that could incorporate system-on-chip, system-in-package, and system-on-package technologies.  相似文献   

14.
多处理器片上系统任务调度研究进展评述   总被引:9,自引:0,他引:9  
多处理器片上系统在单芯片上集成了多种指令集处理器,可完成复杂完整的功能,在图像处理、网络多媒体和嵌入式系统等应用领域前景广阔.任务映射与调度是多处理器片上系统设计的关键问题之一.介绍了多处理器片上系统的基本结构和面临的挑战,从调度算法分析和实现框架两个方面着重探讨了近年来多处理器片上系统任务调度的国内外研究进展情况,分析了当前亟待解决的问题与下一步主要的研究方向,可为多处理器片上系统相关研究提供参考.  相似文献   

15.
In this paper, we describe a technique to design UML-based software models for MPSoC architecture, which focuses on the development of the platform specific model of embedded software. To develop the platform specific model, we define a process for the design of UML-based software model and suggest an algorithm with precise actions to map the model to MPSoC architecture. In order to support our design process, we implemented our approach in an integrated tool. Using the tool, we applied our design technique to a target system. We believe that our technique provides several benefits such as improving parallelism of tasks and fast-and-valid mapping of software models to hardware architecture.  相似文献   

16.
The BOAR emulation system is targeted to hardware/software (HW/SW) codevelopment of advanced embedded DSP and telecom systems. The challenge of the BOAR system is efficient customization of programmable hardware, and dedicated partitioning routine to target applications and structures, which allows quite high overall system performance. The system allows multiple configurations for communication between processors and field programmable gate arrays (FPGAs) making the BOAR system an efficient tool for real-time HW/SW coverification. The reprogrammable hardware of the emulation tool is based on four Xilinx 4000-series devices, two Texas TMS320C50 signal processors and one Motorola MC68302 microcontroller. With current devices the BOAR hardware provides approximately 40–70 kgates of logic capacity in DSP applications. The emulation capacity can be expanded by connecting several similar boards in chain. The system has also a versatile internal reprogrammable test environment for test bench development, performance evaluations and design debugging. The logic development environment is based on the Synopsys synthesis tools and an automatic design management software, which performs resource mapping and performance-driven design partitioning between FPGAs. The emulation hardware is currently connected to logic and software development environments via an RS-232C bus. The BOAR emulation system has been found a very efficient platform for real-life prototyping of different types of DSP algorithms and systems, and validating correct functionality of a VHDL macro library.  相似文献   

17.
Multithreaded architectures have been proposed for future multiprocessor systems. However, some open issues remain. Can multithreading be supported in a multiprocessor so that it can tolerate synchronization and communication latencies, with little intrusion on the performance of sequentially-executed code? How much does such support contribute to scalable performance when communication and synchronization demands are high? In this paper, we describe the design of EARTH, an architecture which addresses these issues. Each processor in EARTH has an off-the-shelf Execution Unit (EU) for executing threads, and an ASIC Synchronization Unit (SU) supporting dataflow-like thread synchronizations, scheduling, and remote requests. In preparation for an implementation of the SU, we have emulated a basic EARTH model on MANNA 2.0, an existing multiprocessor whose hardware configuration closely matches EARTH. This EARTH-MANNA testbed is fully functional, enabling us to experiment with large benchmarks with impressive speed. With this platform, we demonstrate that multithreading support can be efficiently implemented (with little emulation overhead) in a multiprocessor without a major impact on uniprocessor performance. Also, we measure how much basic multithreading support can help in tolerating increasing communication/synchronization demands.  相似文献   

18.
For the design of classic computers the parallel programming concept is used to abstract HW/SW interfaces during high level specification of application software. The software is then adapted to existing multiprocessor platforms using a low level software layer that implements the programming model. Unlike classic computers, the design of heterogeneous MPSoC includes also building the processors and other kind of hardware components required to execute the software. In this case, the programming model hides both hardware and software refinements. This paper deals with parallel programming models to abstract both hardware and software interfaces in the case of heterogeneous MPSoC design. Different abstraction levels will be needed. For the long term, the use of higher level programming models will open new vistas for optimization and architecture exploration like CPU/RTOS tradeoffs.  相似文献   

19.
Because embedded systems mostly target mass production and often run on batteries, they should be cheap to realize and power efficient. In addition, they require a high degree of programmability to provide real-time performance for multiple applications and standards. However, performance requirements as well as cost and power-consumption constraints demand that substantial parts of these systems be implemented in dedicated hardware blocks. As a result, their heterogeneous system architecture consists of components ranging from fully dedicated hardware components for time-critical application tasks. Increasingly, these designs yield heterogeneous embedded multiprocessor systems that reside together on a single chip. The heterogeneity of these highly programmable systems and the varying demands of their target applications greatly complicate system design. The increasing complexity of embedded-system architectures makes predicting performance behavior more difficult. Therefore, having the appropriate tools to explore different choices at an early design stage is increasingly important. The Artemis modeling and simulation environment aims to efficiently explore the design space of heterogeneous embedded-systems architectures at multiple abstraction levels and for a wide range of applications targeting these architectures. The authors describe their of this methodology in two studies that showed promising results, providing useful feedback on a wide range of design decisions involving the architectures for the two applications  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号