期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multi-Processor SoC-Based Design Methodologies Using Configurable and Extensible Processors

Grant Martin 《Journal of Signal Processing Systems》2008,53(1-2):113-127

The growing interest in multiprocessor system-on-chip (MPSoC) design, or ‘multicore’ processors, has resulted in some confusion between the various types of multiprocessor architectures and their suitability in different application spaces. In particular, there are clear differences between the general-purpose, symmetric multiprocessor (SMP) approaches, and the application-specific, asymmetric multiprocessor (AMP) architectures. Configurable and extensible processors are especially suited for the AMP approach, yet their flexibility means that new design methodologies and tools must be developed to allow effective utilisation of multiple instruction-set processors in a complex design. Configurable and extensible processors are especially well suited for data-intensive computational tasks, such as are found in many signal and image processing applications, including audio, video, and wireless and wired networking. A design methodology for such applications must pay careful attention to the right programming models, and dataflow styles of processing seem a natural fit to the application space. In this paper, we describe a design methodology, flow and tools for MPSoC design using configurable and extensible processors that is especially interesting for data-intensive dataflow style applications. Some of the issues involved in this design approach are used to highlight opportunities for ongoing research. 相似文献

2.

基于MPSoC的以太网接口设计与实现 总被引：1，自引：0，他引：1

李桦林宋同晶赵成伟《电子科技》2011,24(12):106-108,132

研究了以太网在多核系统中的数据通讯,设计了以太网IP核到MPSoC网络资源的硬件接口。阐述了设计中各模块的实现功能和设计方法,通过仿真和FPGA验证结果表明,以太网接口数据通讯具有实时和高吞吐率。实现了多核系统与网络数据的信息传递,硬件设计结构简单、性能稳定可靠相似文献

3.

Multiprocessor system-on-chip technology

Wolf W. 《Signal Processing Magazine, IEEE》2009,26(6):50-54

Signal processing is a prime application for very large scale integration (VLSI) technology and systems-on-chips (SoCs), so it should be no surprise that a great deal of effort has been put into the design of architectures for signal processing. The need for programmability, real-time performance, and low-power operation are all driving factors in the development of these architectures. Although multicore processors have recently emerged for desktop and server computing, single-chip multiprocessors have a much longer history in embedded computing thanks to the strict requirements placed on these systems. Multiprocessor SoCs (MPSoCs) have been developed in response to the needs of embedded signal processing and multimedia computing. This article surveys the requirements on embedded signal processing systems and how those requirements are reflected in MPSoC architectures and the software developed for them. 相似文献

4.

A Noise-Robust Convex-Optimized Positioning System Based on Code-Aided RSS Estimation and Virtual Base Station Transform

Li-Hong Huang Kai-Ting Shr Ming-Hung Lin Yuan-Hao Huang 《Journal of Signal Processing Systems》2016,83(3):309-328

Parallelization of Digital Signal Processing (DSP) software is an important trend in Multiprocessor System-on-Chip (MPSoC) implementation. The performance of DSP systems composed of parallelized computations depends on the scheduling technique, which must in general allocate computation and communication resources for competing tasks, and ensure that data dependencies are satisfied. In this paper, we formulate a new type of parallel task scheduling problem called Parallel Actor Scheduling (PAS) for MPSoC mapping of DSP systems that are represented as Synchronous Dataflow (SDF) graphs. In contrast to traditional SDF-based scheduling techniques, which focus on exploiting graph level (inter-actor) parallelism, the PAS problem targets the integrated exploitation of both intra- and inter-actor parallelism for platforms in which individual actors can be parallelized across multiple processing units. We first address a special case of the PAS problem in which all of the actors in the DSP application or subsystem being optimized are parallel actors (i.e., they can be parallelized to exploit multiple cores). For this special case, we develop and experimentally evaluate a two-phase scheduling framework with three work flows that involve particle swarm optimization (PSO) — PSO with a mixed integer programming formulation, PSO with simulated annealing, and PSO with a fast heuristic based on list scheduling. Then, we extend our scheduling framework to support the general PAS problem, which considers both parallel actors and sequential actors (actors that cannot be parallelized) in an integrated manner. We demonstrate that our PAS-targeted scheduling framework provides a useful range of trade-offs between synthesis time requirements and the quality of the derived solutions. We also demonstrate the performance of our scheduling framework from two aspects: simulations on a diverse set of randomly generated SDF graphs, and implementations of an image processing application and a software defined radio benchmark on a state-of-the-art multicore DSP platform. 相似文献

5.

LLVMVF: A Generic Approach for Verification of Multicore Software

Marcelo Sousa Alper Sen 《Journal of Electronic Testing》2013,29(5):635-646

Proliferation of multicore hardware boosted the need for verification of multicore software that is running on these hardware. Multicore software demands new verification techniques different from the ones used for sequential software. Many optimized compiler frameworks are arising to address the complexities of multicore software. Among these compilers, Low Level Virtual Machine (LLVM) is especially gaining popularity because i) has a universal front-end that allows to read in many different input languages, ii) aggressive optimizations to improve code performance and quality, and iii) a well-defined intermediate bytecode representation, called LLVM IR, that allows a unified intermediate representation. In this work, we present a novel framework, called LLVM Verification Framework (LLVMVF), implemented in a purely functional language for verification of multicore software. To our knowledge, this is the first verification framework using the LLVM bytecode representation for multicore software. We present an SMT-based Bounded Model Checker backend of LLVMVF and perform initial experiments on multicore software using Pthreads library. Furthermore, we compare our results with an existing multicore software verification tool. 相似文献

6.

Parallel application sampling for accelerating MPSoC simulation

Melhem Tawk Khaled Z. Ibrahim Smail Niar 《Design Automation for Embedded Systems》2010,14(4):367-387

Multi-processor system-on-chip (MPSoC) simulators are many orders of magnitude slower than the hardware they simulate due to increasing architectural complexity. In this paper, we propose a new application sampling technique to accelerate the simulation of MPSoC design space exploration (DSE). The proposed technique dynamically combines simultaneously executed phases, thus generating a sampling unit. This technique accelerates the simulation by allowing the repeated combinations of parallel phases to be skipped. A complementary technique, called cluster synthesis, is also proposed to improve the simulation acceleration when the number of possible phase combinations increases. Our experimental results show that this technique can accelerate the simulation up to a factor of 800 with a relatively small estimation error. 相似文献

7.

Application-Specific MPSoC Reliability Optimization

Zhenyu Gu Changyun Zhu Li Shang Dick R.P. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(5):603-608

This paper presents modeling and estimation techniques permitting the temperature-aware optimization of application-specific multiprocessor system-on-chip (MPSoC) reliability. Technology scaling and increasing power densities make MPSoC lifetime reliability problems more severe. MPSoC reliability strongly depends on system-level MPSoC architecture, redundancy, and thermal profile during operation. We propose an efficient temperature-aware MPSoC reliability analysis and prediction technique that enables MPSoC reliability optimization via redundancy and temperature-aware design planning. Reliability, performance, and area are concurrently optimized. Simulation results indicate that the proposed approach has the potential to substantially improve MPSoC system mean time to failure with small area overhead. 相似文献

8.

Deterministic reversible MPSoC debugger based on virtual platform execution traces

Marcos?Aurélio?Pinto?Cunha Email author Nicolas?Fournel Frédéric?Pétrot 《Design Automation for Embedded Systems》2016,20(1):47-63

The increasing complexity of multiprocessor system on chip (MPSoC) makes the software developers life harder when chasing bugs. The debugging process is particularly tedious as it involves analyzing parallel execution flows. Executing a program many times is an integral part of the process in conventional debugging, but the non-determinism due to parallel execution often leads to different execution paths and different behaviors. In this paper, we propose an approach based on simulation, as it is nowadays an integral part of the MPSoC design flow, to ease pin-pointing bugs in a parallel execution. To that aim, we collect traces using a virtual platform, and when an execution fails, re-execute the traces, in either forward or reverse direction. We define a trace model suitable for this task, and detail a strategy for providing forward and reverse execution features to avoid long simulation times during a debug session. We demonstrate experimentally that re-execution is a deterministic process which, when debugging using the usual trial and error developer approach, is much faster than simulation. 相似文献

9.

Validating Power Architecture™ Technology-Based MPSoCs Through Executable Specifications

Bhadra J. Trofimova E. Abadir M.S. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(4):388-396

Multiprocessor systems-on-chip (MPSoC) pose a considerable validation challenge due to their size and complexity. We approach the problem of MPSoC validation through a tool that employs a reusable abstract executable specification written in C++. The tool effectively leverages a simulation-based, trace-driven mechanism. Traces are computed by simulating a system level register-transfer level (RTL) implementation of an MPSoC. The tool then analyzes the traces for correctness by checking them across executions of the abstract executable specification. We have effectively used the tool on various live MPSoC design projects based on the Power Architecture technology (The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.). We demonstrate the effectiveness of the technique through results from these projects where we uncovered a number of design errors not found by any other technique. 相似文献

10.

异构多处理器系统芯片的设计与研究

邵利群张文婷《中国集成电路》2008,17(3):49-52

随着集成电路工艺特征尺寸的缩小和电路规模的不断扩大,单颗芯片上集成器件数目成指数倍增长。传统的SoC架构在提高系统整体性能上已出现一些瓶颈,多核系统设计正成为目前集成电路设计的研究热点之一。对称式多处理器系统芯片可以在很大程度上提高系统的并行性,但是在一些复杂应用领域中并不能提供最优的性能。本文通过在单颗芯片上集成多个不同的处理器核来研究异构多核系统相对于同构多核系统所带来的技术优势。相似文献

11.

An object oriented model scheduling for media-SoC

Xingmei Cheng Yingbiao Yao Yixiong Zhang Peng Liu Qingdong Yao 《电子科学学刊(英文版)》2009,26(2):244-251

This paper proposes an object oriented model scheduling for parallel computing in media MultiProcessors System on Chip (MPSoC). Firstly, the Coarse Grain Data Flow Graph (CGDFG) parallel programming model is used in this approach. Secondly, this approach has the feature of unified abstraction for software objects implementing in processor and hardware objects implementing in ASICs, easy for mapping CGDFG programming on MPSoC. This approach cuts down the kernel overhead and reduces the code size effectively. The principle of the oriented object model, the method of scheduling, and how to map a parallel programming through CGDFG to the MPSoC are analyzed in this approach. This approach also compares the code size and execution cycles with conventional control flow scheduling, and presents respective management overhead for one application in media-SoC. 相似文献

12.

Implementation of W-CDMA Cell Search on a Highly Parallel and Scalable MPSoC

Roberto Airoldi Tapani Ahonen Fabio Garzia Dragomir Milojevic Jari Nurmi 《Journal of Signal Processing Systems》2011,64(1):137-148

The performance of the W-CDMA cell search algorithm can be significantly improved using homogeneous general purpose Multi-Processor System-on-Chip (MPSoC) architectures. The application also scales well, as the number of processing nodes increases, allowing practical accelerations to become close to the theoretical maximum. In this work we describe a template MPSoC architecture based on multiprocessor computational clusters, called Ninesilica. Each Ninesilica consist of nine processing nodes based on COFFEE RISC architecture. MPSoC inter- and intra-cluster communication are enabled using hierarchical Network-on-Chip with dedicated point to point and broadcast communication services for better performance. Proposed template has been used to instantiate complete systems with one and four Ninesilica clusters, resulting in MPSoCs with respectively 9 and 36 computational nodes. The MPSoCs have been physically prototyped on a FPGA device, and the W-CDMA cell search algorithm has been mapped on both MPSoC platforms. The four Ninesilica MPSoC can execute W-CDMA in 20.5 ms (at 115 MHz, slow mode implementation) with the total speed-up of 24.3X and 3.3X when compared to a single processing core system and to a single Ninesilica cluster, respectively. 相似文献

13.

基于Amdahl定律扩展的多核处理器性能模型研究

下载免费PDF全文

冯晓戴紫彬蔡路亭李伟《电子学报》2017,45(6):1424

通过引入应用程序并行特征、通信开销、资源限制等因素,建立了基于Amdahl定律扩展的多核处理器性能模型.通过模型参数仿真,搜索面向特定应用的多核处理器设计空间,得出如下规律:增大计算核心规模可实现超线性加速比;结构应优先选择异构结构;设计多进程、大容量的共享通信区可降低核间通信开销;计算核心数目和规模由应用程序并行度和各并行部分比例及设计规模决定. 相似文献

14.

Message-Passing Programming for Embedded Multicore Signal-Processing Platforms

Shih-Hao Hung Po-Hsun Chiu Chia-Heng Tu Wei-Ting Chou Wen-Long Yang 《Journal of Signal Processing Systems》2014,75(2):123-139

Recently, embedded multicore platforms have become popular for signal processing, but software development for such platforms is still very slow. First, parallel programming is more challenging than sequential programming to average programmers. To make the problem worse, software is not portable among the platforms, since each multicore signal-processing platform offers its own programming interface/language. We believe this problem can be relieved by adding the support of a standard message-passing programming to embedded multicore platforms. In particular, we would like to leverage MPI, the most successful message-passing system, which practically enables the development of portable applications to run on many parallel machines. There are technical challenges to support MPI on embedded multicore platforms: the size of the library, architecture issues, and performance issues. This paper identifies and addresses these issues. To enable the reuse of existing MPI programs and make message-passing programming portable and efficient, we designed a light-weight MPI-like message-passing library with a three-layer modular design, where the top two layers are mostly platform-independent, and the bottom layer enables platform-specific optimizations. This approach has allowed us to effectively support message-passing on several popular embedded multicore signal-processing platforms, including the IBM CELL and the ITRI PAC Duo. Our results show that message-passing programming is a viable solution for multicore signal processing applications and may be considered by platform vendors. 相似文献

15.

访存与用户行为敏感的MPSoC应用映射 总被引：1，自引：0，他引：1

下载免费PDF全文

王一拙左琦计卫星王小军石峰《电子学报》2015,43(4):631-638

应用映射是MPSoC设计中的关键问题,针对多应用负载的MPSoC,提出一种访存与用户行为敏感的动态映射策略,该策略根据应用的数据访问特征区分热点与非热点应用,并对用户行为进行建模,根据用户行为模型,进一步在运行时区分关键与非关键应用.对每个进入系统的应用,按照应用的热点及关键性分类动态选择在线映射算法,让热点应用围绕存储器布局,非热点应用尽量避免占用存储器附近的资源;对关键应用,最小化应用内通信开销和链路竞争,对非关键应用,最小化应用间通信开销和链路竞争.实验表明,与单纯考虑访存或用户行为的映射策略相比,本文策略能够降低系统整体的通信能耗. 相似文献

16.

Emulation-based transient thermal modeling of 2D/3D systems-on-chip with active cooling

Pablo G. Del Valle David Atienza 《Microelectronics Journal》2011,42(4):564-571

State-of-the-art devices in the consumer electronics market are relying more and more on Multi-Processor Systems-On-Chip (MPSoCs) as an efficient solution to meet their multiple design constrains, such as low cost, low power consumption, high performance and short time-to-market. In fact, as technology scales down, logic density and power density increase, generating hot spots that seriously affect the MPSoC performance and can physically damage the final system behavior. Moreover, forthcoming three-dimensional (3D) MPSoCs can achieve higher system integration density, but the aforementioned thermal problems are seriously aggravated. Thus, new thermal exploration tools are needed to study the temperature variation effects inside 3D MPSoCs. In this paper, we present a novel approach for fast transient thermal modeling and analysis of 3D MPSoCs with active (liquid) cooling solutions, while capturing the hardware-software interaction. In order to preserve both accuracy and speed, we propose a close-loop framework that combines the use of Field Programmable Gate Arrays (FPGAs) to emulate the hardware components of 2D/3D MPSoC platforms with a highly optimized thermal simulator, which uses an RC-based linear thermal model to analyze the liquid flow. The proposed framework offers speed-ups of more than three orders of magnitude when compared to cycle-accurate 3D MPSoC thermal simulators. Thus, this approach enables MPSoC designers to validate different hardware- and software-based 3D thermal management policies in real-time, and while running real-life applications, including liquid cooling injection control. 相似文献

17.

System-level design optimization of reliable and low power multiprocessor system-on-chip

Rishad A. Shafik Bashir M. Al-Hashimi Jeff S. Reeve 《Microelectronics Reliability》2012,52(8):1735-1748

In this paper, we study the impact of application task mapping on the reliability of multiprocessor system-on-chip (MPSoC) application in the presence of soft errors. Based on this study, we propose a novel system-level design optimization of an MPSoC application through joint power minimization and reliability improvement. The power minimization is carried out using voltage scaling technique, while reliability improvement is achieved through careful choice of application task mapping on the homogeneous MPSoC processing cores. The overall aim is to minimize the number of single-event upsets (SEUs) experienced by the MPSoC application for suitably identified voltage scaling of the system processing cores such that the power is reduced and the specified real-time constraint is met. We evaluate the effectiveness of the proposed design optimization using a number of different applications, including MPEG-2 video decoder and synthetic applications. We show that for an MPEG-2 decoder with four processing cores, the proposed soft error-aware optimization produces a design with 38% less SEUs than soft error-unaware design optimization for an arbitrary soft error rate of 10^?9, while consuming 9% less power and meeting a given real-time constraint. Furthermore, we investigate the impact of architecture allocation (allocation of processing cores) and show that for an MPSoC with six processing cores and a given real-time constraint, the proposed optimization produces design with up to 7% less SEUs compared to soft error-unaware designs at the cost of 5.5% higher power. 相似文献

18.

多核软件的几个关键问题及其研究进展 总被引：4，自引：2，他引：2

下载免费PDF全文

杨际祥谭国真王荣生《电子学报》2010,38(9):2140-2146

提高应用程序开发产能同时获得并行性能收益是多核大众化并行计算研究的核心目标.采用应用驱动和自顶向下的研究思想着重综述了影响该目标的三个关键问题.首先,对当前的多核应用驱动研究做了比较,并对多核应用研究现状做了综述.其次,对当前的多核编程模型在产能编程和性能使能编程方面的研究思想做了比较研究.然后,综述了多核算法以及多核计算模型的研究现状.最后分析了多核软件未来的研究问题. 相似文献

19.

Design and Synthesis of a Multiprocessor System-on-Chip Architecture for Real-Time Biomedical Signal Processing in Gamma Cameras

Kai Sun Meng Wang Zili Shao Hui Liu Hongxing Wei Tianmiao Wang 《Journal of Signal Processing Systems》2010,59(1):71-83

MPSoC (Multi-Processor System-on-Chip) architecture is becoming increasingly used because it can provide designers much more opportunities to meet specific performance and power goals. In this paper, we propose an MPSoC architecture for implementing real-time signal processing in gamma camera. Based on a fully analysis of the characteristics of the application, we design several algorithms to optimize the systems in terms of processing speed, power consumption, and area costs etc. Two types of DSP core have been designed for the integral algorithm and the coordinate algorithm, the key parts of signal processing in a gamma camera. An interconnection synthesis algorithm is proposed to reduce the area cost of the Network-on-Chip. We implement our MPSoC architecture on FPGA, and synthesize DSP cores and Network-on-Chip using Synopsys Design Compiler with a UMC 0.18 \upmum\upmu\textrm m standard cell library. The results show that our technique can effectively accelerate the processing and satisfy the requirements of real-time signal processing for 256 × 256 image construction. 相似文献

20.

Systematic MIMO OFDM transceiver implementation for MPSoCs: a nucleus based approach

D. Guenther T. Kempf A. Ishaque G. Ascheid 《Analog Integrated Circuits and Signal Processing》2012,73(2):597-612

In this paper, we analyze the potential as well as the limitations of multiprocessor system-on-chip (MPSoC) platforms when implementing software defined radio (SDR) applications for wireless communications. Suitable MPSoCs contain a potentially heterogeneous multi-core computing cluster and can be further equipped with application specific accelerators. The physical layer of a MIMO OFDM transceiver, for which the IEEE 802.11n standard serves as reference, is investigated in this work. To maintain portability, the platform independent algorithmic kernels (Nuclei) are identified first. In the following case study, efficient implementations (Flavors) of these Nuclei are implemented on an MPSoC platform. Resultant algorithmic performance (e.g., frame-error-rate) as well as the system performance (e.g., latency and throughput) are discussed. 相似文献