首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The ever-increasing complexity of MPSoCs is putting the production of software on the critical path in embedded system development. Several programming models and tools have been proposed in the recent past that aim to facilitate application development for embedded MPSoCs. OpenMP is a mature and easy-to-use standard for shared memory programming, which has recently been successfully adopted in embedded MPSoC programming as well. To achieve performance, however, it is necessary that the implementation of OpenMP constructs efficiently exploits the many peculiarities of MPSoC hardware, and that custom features are provided to the programmer to control it. In this paper we consider a representative template of a modern multi-cluster embedded MPSoC and present an extensive evaluation of the cost associated with supporting OpenMP on such a machine, investigating several implementation variants that are aware of the memory hierarchy and of the heterogeneous interconnection.  相似文献   

2.
Nested OpenMP parallelism allows an application to spawn teams of nested threads. This hierarchical nature of thread creation and usage poses problems for performance measurement tools that must determine thread context to properly maintain per-thread performance data. In this paper we describe the problem and a novel solution for identifying threads uniquely. Our approach has been implemented in the TAU performance system and has been successfully used in profiling and tracing OpenMP applications with nested parallelism. We also describe how extensions to the OpenMP standard can help tool developers uniquely identify threads.  相似文献   

3.
数组私有化是并行化编译中的重要技术之一,IBM Cell是异构多核处理器,SPMD代表实现OpenMP数组私有化的重要手段,但是SPMD形式的OpenMP程序却不能直接通过IBM XLC(适用于IBM Cell多核平台的编译器)的编译.为了解决该问题,并充分利用IBM Cell本地存储器中的静态缓冲区以减少DMA通信,提出一种IBM Cell多核平台的OpenMP数组私有化技术.旨在充分利用本地存储器、减少DMA通信,集中处理可重用数据的私有化.主要包括:数组私有化分析、数组私有化转换、同步消除与非阻塞DMA操作,从而扩大数据的可重用作用域.转换后的Jacobi迭代代码进行实际测试表明,这种基于IBM Cell多核平台的数组私有化技术能够平均提高3%左右的执行性能,尤其对于小规模计算来说性能提高还会更多.  相似文献   

4.
本文设计并实现了一个基于值一剖面的OpenMP运行时优化系统CCRG OpenMP。它能够根据常见的值的组合优化并行区域,并且在运行时只有并行区代码需要重编译和管理。CCRG OpenMP基于动态重编译技术,避免了目前静态多版本技术的不足。同时,值-剖面的收集和分析由独立的动态优化器线程完成,降低了动态重编译引入的开销。SPEC OMP2001基准测试表明,我们基于值一剖面的Open MP优化系统能够较大地提高程序性能。  相似文献   

5.
Tasking in OpenMP 3.0 has been conceived to handle the dynamic generation of unstructured parallelism. New directives have been added allowing the user to identify units of independent work (tasks) and to define points to wait for the completion of tasks (task barriers). In this document we propose extensions to allow the runtime detection of dependencies between generated tasks, broading the range of applications that can benefit from tasking or improving the performance when load balancing or locality are critical issues for performance. The proposed extensions are evaluated on a SGI Altix multiprocessor architecture using a couple of small applications and a prototype runtime system implementation.  相似文献   

6.
多核处理器环境下必须解决多核处理器的并行编程问题,才能够充分发挥多核处理器的性能.事务存储(Transactional Memory)机制提供了一种在多核环境下程序并行执行和同步的方法.已有的工作已将事务存储扩展到了OpenMP,为程序员提供满足事务原子性、一致性和隔离性的共享存储访问.但当前事务存储的语义并不完善,事务间不能交换中间结果,不能实现锁的部分语义.提出并实现了一种基于开放嵌套的事务存储的同步语义,从而解决了事务间不能交换中间结果的问题,增强了扩展事务存储后OpenMP的并行编程能力.  相似文献   

7.
MPI+OpenMP混合并行编程模型应用研究   总被引:13,自引:0,他引:13  
多处理器结点集群在高性能计算市场上日趋流行,如何在多处理器上编写出高效的并行代码成为研究的热点。MPI+OpenMP为多处理器结点集群提供了一种有效的并行策略,结点内部共享内存空间编程模式适合 OpenMP并行,消息传递模型MPI被用在集群的结点与结点之间,这样就实现了并行的层次结构化。  相似文献   

8.
为了准确分析OpenMP程序的负载均衡问题,详细分析了在同步点之间进行测量的恰当位置,定义了性能分析单元,给出了负载不均衡程度的计算公式,并提出了一种以性能分析单元为分析对象来测量OpenMP并行程序负载平衡的方法。该方法利用Opari对OpenMP源程序自动插入POMP性能监控函数,通过在相关的性能函数中插入定时器的方式,以分析单元为基本对象来收集程序的负载情况。该方法已在一个OpenMP性能分析工具中得到了实现,能够有效地帮助用户找出程序中负载不均衡的瓶颈。  相似文献   

9.
The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems - distributed memory across nodes and shared memory with non-uniform memory access within each node - poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems - a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures.  相似文献   

10.
为了提高新一代音视频编解码技术标准AVS的编码速度,利用OpenMP在多核处理器平台上研究并实现了AVS的GOP级、条带级,帧级和基于任务队列模型的帧级并行编码算法.对CIF格式的视频序列进行了测试,在四核处理器平台上加速比最高能达到3.82x.另外,基于任务队列模型的帧级并行算法在保持图像质量不变的基础上解决了帧级并行算法加速比偏低的缺点.实验结果表明,OpenMP是一种简单而有效的并行化编程工具,基于OpenMP的各个AVS并行编码算法与原串行算法相比,编码速度都有显著提高.  相似文献   

11.
通常,OpenMP程序开发将开发过程、程序正确性检测和性能分析分离开来.为此,提出动态并行区的概念,并在此基础上提出一种新的OpenMP程序开发模式,将OpenMP程序的开发过程、正确性检测和性能分析紧密地联系起来.在OpenMP程序开发的每一阶段,都能确保程序的正确性;同时,通过精确的性能分析与细微的性能调整,使得OpenMP程序的性能随开发的不断深入而逐步得到改进.据此开发的NPB2.3 OpenMP Fortran版的实测结果显示出该模式的可行性.  相似文献   

12.
OpenMP is an emerging industry standard for shared memory architectures. While OpenMP has advantages on its ease of use and incremental programming, message passing is today still the most widely-used programming model for distributed memory architectures. How to effectively extend OpenMP to distributed memory architectures has been a hot spot. This paper proposes an OpenMP system, called KLCoMP, for distributed memory architectures. Based on the partially replicating shared arrays memory model, we propose ...  相似文献   

13.
研究了一种基于OpenMP技术的多核架构下并行蚁群算法,通过在TSP问题中的实验表明,该算法易于操作,而且充分利用了多核处理器并行计算的优势,提高了算法的运行效率。  相似文献   

14.
陈欢  谢健 《计算机科学》2012,39(106):392-395
随着多核处理器的普及,并为了充分利用多核PC机的特性,计算机技术逐渐向多核架构及多核计算技术发展。为提高对湖南地区100mX 100m小网格气温插值的速度,采用以OpenMP为标准的基于共享存储的并行编程模型对Kriging插值算法进行改进。在不同核的多核PC机中,采用100mX 100m小网格和500mX 500m小网格地形数据对平均气温进行插值,不仅有效减少了插值时间和提高了算法的加速比,而且集成到业务系统中大大提升了系统的反应时间及性能。  相似文献   

15.
OpenMP并行程序的编译器优化   总被引:3,自引:0,他引:3       下载免费PDF全文
OpemMP标准以其良好的可移植性和易用性被广泛应用于并行程序设计。该文讨论了OpenMP并行程序的编译器优化算法,在编译过程中通过并行区合并和扩展,实现并行区重构,并在并行区中实现了基于跨处理器相关图的barrier同步优化。分析验证表明,这些优化策略减少了并行区和barrier同步的数目,有效地提高了OpenMP程序的并行性能。  相似文献   

16.
OpenMP is widely accepted as a de facto standard for shared memory parallel programming in Fortran, C and C++. Nested parallelization has been included in the first OpenMP specification, but it took a few years until the first commercially available compilers supported this optional part of the specification. We employed nested parallelization using OpenMP in three production codes: a C++ code for content-based image retrieval, a C++ code for the computation of critical points in multi-block CFD datasets, and a multi-block Navier-Stokes solver written in Fortran90. In this paper we discuss the opportunities as well as the deficiencies of the nested parallelization support in OpenMP.  相似文献   

17.
基于SMP集群的MPI+OpenMP混合编程模型研究   总被引:4,自引:1,他引:3  
讨论了MPI+OpenMP混合编程模型的特点及其实现方法。建立了对拉普拉斯偏微分方程求解的混合并行算法,并在HL-2A高性能计算系统上同纯MPI算法作了性能方面的比较。结果表明,该混合并行算法具有更好的扩展性和加速比。  相似文献   

18.
为了解决OpenMP程序性能退化问题,本文提出性能退化区和性能退化强度的概念.使用性能退化强度能够剔除非性能退化区并突出执行时间较长的性能退化代码段;同时.性能退化区的分解能够逐步缩小性能退化区并最终准确定位引发性能退化的代码段.去除引发性能退化的根源就能有效改进OpenMP程序的执行性能.实例分析证实了本文提出的OpenMP程序性能退化诊断与处理方法的有效性.  相似文献   

19.
由于指导语句动态嵌套与绑定规则的存在,OpenMP程序中线程的一些上下文只能在运行时刻才能完全确定.然而,通过编译时刻的静态分析可以部分确定指导语句的嵌套类型,这些信息可以用于指导后续的编译与优化.由于函数调用的存在,嵌套与绑定常常会跨越过程边界,除了通常的局部和全局分析之外,还需要过程间分析的支持.通过在通常的过程间分析的基础上附加信息,可以使得嵌套类型信息在过程调用图中进行传播.将这些全局信息与过程内的局部信息结合起来,就可以在编译时刻确定语句的嵌套类型.结果表明,编译时刻的嵌套类型分析可以有效地确定通常的科学与工程计算程序中指导语句的嵌套类型,基于嵌套类型的翻译与优化可以同时减少运行时开销和目标代码长度.  相似文献   

20.
本文研究了基于OpenMP执行模式的功耗优化技术,面向结点具有动态电压调整能力的并行系统提出了基于OpenMP fork/join执行模式的电压调整和扩展障碍同步两种低功耗优化技术及其实现方法;建立了一个能量消耗分析模型。模拟分析显示,本文提出的功耗优化技术能有效地减少并行系统运行OpenMP程序时的能量消耗。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号