期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance prediction through simulation of a hybrid MPI/OpenMP application 总被引：1，自引：0，他引：1

Rocco Aversa Beniamino Di Martino Massimiliano Rak Salvatore Venticinque Umberto Villano 《Parallel Computing》2005,31(10-12):1013

相似文献

2.

一种基于MPI与OpenMP的矩阵乘法并行算法

张艳华刘祥港《计算机与现代化》2011,(7):84-87

阐述MPI与OpenMP进行并行计算的特点,并在Visual Studio 2010上构建一个基于两者的混合编程平台。程序在该平台上执行时能够同时实现多进程与进程内多线程编程,设计并实现一种基于数据划分的矩阵乘法的并行算法,将数据分解为两部分交给两个计算节点分别完成,并在每个计算节点内将数据进一步划分,交给多个线程同时执行。通过与非并行矩阵乘法、MPI矩阵乘法、OpenMP矩阵乘法运算性能进行比较,验证该算法可以有效地挖掘计算机的处理能力。相似文献

3.

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters 总被引：2，自引：0，他引：2

Chao-Tung Yang Chih-Lin Huang Cheng-Fang Lin 《Computer Physics Communications》2011,(1):266-269

Nowadays, NVIDIA's CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we propose a parallel programming approach using hybrid CUDA OpenMP, and MPI programming, which partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node. 相似文献

4.

High-Scalability Parallelization of a Molecular Modeling Application: Performance and Productivity Comparison Between OpenMP and MPI Implementations

Russell Brown Ilya Sharapov 《International journal of parallel programming》2007,35(5):441-458

Important components of molecular modeling applications are estimation and minimization of the internal energy of a molecule. For macromolecules such as proteins and amino acids, energy estimation is performed using empirical equations known as force fields. Over the past several decades, much effort has been directed towards improving the accuracy of these equations, and the resulting increased accuracy has come at the expense of greater computational complexity. For example, the interactions between a protein and surrounding water molecules have been modeled with improved accuracy using the generalized Born solvation model, which increases the computational complexity to O (n ³). Fortunately, many force-field calculations are amenable to parallel execution. This paper describes the steps that were required to transform the Born calculation from a serial program into a parallel program suitable for parallel execution in both the OpenMP and MPI environments. Measurements of the parallel performance on a symmetric multiprocessor reveal that the Born calculation scales well for up to 144 processors. In some cases the OpenMP implementation scales better than the MPI implementation, but in other cases the MPI implementation scales better than the OpenMP implementation. However, in all cases the OpenMP implementation performs better than the MPI implementation, and requires less programming effort as well. Trademark Legend Sun, Sun Microsystems, SPARC, UltraSPARC, Sun Fire, Sun Performance Library and Sun HPC Cluster Tools are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. 相似文献

5.

基于MPI与OpenMP的飞机电磁软杀伤动态评估分析

黄隽张浩然胡云安金焱《计算机与数字工程》2010,38(12)

针对国内外强电磁脉冲耦合效应分析现状及问题,阐述了电磁软杀伤概念,提出了基于共享存储OpenMP标准与消息传递MPI的时域有限差分-通用电路仿真程序FDTD-SPICE并行同步算法,建立了在电磁脉冲激励作用下的动态耦合模型,解决了飞机电磁软杀伤动态评估的难题;以某军机为例,开展了电磁武器的座舱耦合分布电磁场和FDTD-SPICE等效电路的并行同步仿真试验与防护评估分析.试验结果表明,该算法并行同步效果良好,军机在通电状态下,军机在通电状态下,幅度最值大大高于不通电条件下的幅度最值,易损性增强,防护效果减弱;防护座舱能大大减轻电磁武器对机载电路的影响. 相似文献

6.

Hybrid MPI/OpenMP cloud parallelization of harmonic coupled finite strip method applied on reinforced concrete prismatic shell structure

《Advances in Engineering Software》2015

This paper discusses the cloud computing based approach for parallelization of large displacement stability analysis of orthotropic prismatic shell structures with simply supported boundary conditions along the diaphragm-supported edges. We review the harmonic coupled finite strip method (HCFSM), and describe a software system for nonlinear analysis of reinforced concrete (RC) structures. We combine different parallelization models – MPI and OpenMP – in order to cope with the increased computational complexity, which originates from coupling of all series terms in the HCFSM formulation. We discuss the effects of parallelization from the perspective of a cloud environment. Our results show that rational usage of cloud resources can lead to significant performance improvements and monetary savings. In certain cases, the achieved performance can be very close to the maximum one. 相似文献

7.

针对非规则应用的OpenMP制导扩展 总被引：1，自引：0，他引：1

顾丽红吴少刚章隆兵蔡飞《小型微型计算机系统》2005,26(1):124-128

许多非规则应用的棱心是稀疏矩阵运算．稀疏矩阵运算的特点是对一个数组元素的引用依赖于另两个数组的元素值，因此具有非规则访存特点．本文针对稀疏矩阵运算特点，提出一种新的OpenMP制导子句indirect，并在机群OpenMP系统OpenMP／JIAJIA上进行了实现．采用一个实的OpenMP应用Equake进行了测试，测试结果表明该制导扩展很有效，对于直接使用该制导子句的函数代码，其性能改进了18％，而整个应用的性能改进了15％．相似文献

8.

Hybrid MPI + OpenMP parallelization of an FFT-based 3D Poisson solver with one periodic direction 总被引：1，自引：0，他引：1

A. Gorobets F.X. Trias R. Borrell O. Lehmkuhl A. Oliva 《Computers & Fluids》2011,49(1):101-109

This work is devoted to the development of efficient parallel algorithms for the direct numerical simulation (DNS) of incompressible flows on modern supercomputers. In doing so, a Poisson equation needs to be solved at each time-step to project the velocity field onto a divergence-free space. Due to the non-local nature of its solution, this elliptic system is the part of the algorithm that is most difficult to parallelize. The Poisson solver presented here is restricted to problems with one uniform periodic direction. It is a combination of a block preconditioned Conjugate Gradient (PCG) and an FFT diagonalization. The latter decomposes the original system into a set of mutually independent 2D systems that are solved by means of the PCG algorithm. For the most ill-conditioned systems, that correspond to the lowest Fourier frequencies, the PCG is replaced by a direct Schur-complement based solver.The previous version of the Poisson solver was conceived for single-core (also dual-core) processors and therefore, the distributed memory model with message-passing interface (MPI) was used. The irruption of multi-core architectures motivated the use of a two-level hybrid MPI + OpenMP parallelization with the shared memory model on the second level. Advantages and implementation details for the additional OpenMP parallelization are presented and discussed in this paper. Numerical experiments show that, within its range of efficient scalability, the previous MPI-only parallelization is slightly outperformed by the MPI + OpenMP approach. But more importantly, the hybrid parallelization has allowed to significantly extend the range of efficient scalability. Here, the solver has been successfully tested up to 12800 CPU cores for meshes with up to 10⁹ grid points. However, estimations based on the presented results show that this range can be potentially stretched up until 200,000 cores approximately. Finally, several examples of DNS simulations are briefly presented to illustrate some potential applications of the solver. 相似文献

9.

High performance computing using MPI and OpenMP on multi-core parallel systems 总被引：1，自引：0，他引：1

Haoqiang Jin Dennis JespersenPiyush Mehrotra Rupak BiswasLei Huang Barbara Chapman 《Parallel Computing》2011,37(9):562-575

The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems - distributed memory across nodes and shared memory with non-uniform memory access within each node - poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems - a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures. 相似文献

10.

嵌入式零树小波压缩和解压缩的并行化算法

韩丽洁李文田晏嘉《计算机应用》2009,29(Z1)

嵌入式零树小波压缩算法是图像压缩技术中有效的压缩算法,但其压缩时间较长.对该算法进行了研究,并在多核机群系统下实现了该算法的并行算法,提高了算法的性能.实现了MPI和MPI+OpenMP两种并行算法,并将串行算法、MPI并行算法与MPI+OpenMP并行算法进行比较.结果显示,随着数据量的增多,MPI并行算法和MPI+OpenMP并行算法相对于串行算法的运行效率都有明显提高,其中MPI+OpenMP并行算法的效率更好. 相似文献

11.

Parallel iterative solvers for finite-element methods using an OpenMP/MPI hybrid programming model on the Earth Simulator 总被引：1，自引：0，他引：1

Kengo Nakajima 《Parallel Computing》2005,31(10-12):1048

An efficient parallel iterative method for finite-element method has been developed for symmetric multiprocessor (SMP) cluster architectures with vector processors such as the Earth Simulator. The method is based on a three-level hybrid parallel programming model, including message passing for inter-SMP node communication, loop directives by OpenMP for intra-SMP node parallelization and vectorization for each processing element (PE). Simple 3D linear elastic problems with more than 2.2 × 10⁹ DOF have been solved using 3 × 3 block ICCG(0) method with additive Schwarz domain decomposition and PDJDS/CM-RCM reordering on 176 nodes of the Earth Simulator, achieving performance of 3.80 TFLOPS. Furthermore, effect of color number in reordering has been evaluated on various types of computers. 相似文献

12.

A hierarchical distributed-shared memory parallel Branch&Bound application with PVM and OpenMP for multiprocessor clusters

Rocco Aversa Beniamino Di Martino Nicola Mazzocca Salvatore Venticinque 《Parallel Computing》2005,31(10-12):1034

Branch&Bound (B&B) is a technique widely used to solve combinatorial optimization problems in physics and engineering science. In this paper we show how the combined use of PVM and OpenMP libraries can be a promising approach to exploit the intrinsic parallel nature of this class of application and to obtain efficient code for hybrid computational architectures. We described how both the shared memory and the distributed memory programming models can be applied to implement the same algorithm for the inter-nodes and intra-node parallelization. Some experimental tests on a local area network (LAN) of workstations are finally discussed. 相似文献

13.

MPI+OpenMP混合并行编程模型应用研究 总被引：13，自引：0，他引：13

下载免费PDF全文

冯云周淑秋《计算机系统应用》2006,15(2):86-89

多处理器结点集群在高性能计算市场上日趋流行,如何在多处理器上编写出高效的并行代码成为研究的热点。MPI+OpenMP为多处理器结点集群提供了一种有效的并行策略,结点内部共享内存空间编程模式适合 OpenMP并行,消息传递模型MPI被用在集群的结点与结点之间,这样就实现了并行的层次结构化。相似文献

14.

一种基于POMP的OpenMP程序负载均衡分析方法

殷顺昌赵克佳《计算机工程与应用》2006,42(35):84-87

为了准确分析OpenMP程序的负载均衡问题,详细分析了在同步点之间进行测量的恰当位置,定义了性能分析单元,给出了负载不均衡程度的计算公式,并提出了一种以性能分析单元为分析对象来测量OpenMP并行程序负载平衡的方法。该方法利用Opari对OpenMP源程序自动插入POMP性能监控函数,通过在相关的性能函数中插入定时器的方式,以分析单元为基本对象来收集程序的负载情况。该方法已在一个OpenMP性能分析工具中得到了实现,能够有效地帮助用户找出程序中负载不均衡的瓶颈。相似文献

15.

基于OpenMP的AVS并行编码算法研究与实现

胡文安于鸿洋《计算机工程与设计》2010,31(10)

为了提高新一代音视频编解码技术标准AVS的编码速度,利用OpenMP在多核处理器平台上研究并实现了AVS的GOP级、条带级,帧级和基于任务队列模型的帧级并行编码算法.对CIF格式的视频序列进行了测试,在四核处理器平台上加速比最高能达到3.82x.另外,基于任务队列模型的帧级并行算法在保持图像质量不变的基础上解决了帧级并行算法加速比偏低的缺点.实验结果表明,OpenMP是一种简单而有效的并行化编程工具,基于OpenMP的各个AVS并行编码算法与原串行算法相比,编码速度都有显著提高. 相似文献

16.

层次结构的多数据库系统中事务执行的正确性准则 总被引：1，自引：1，他引：1

陈国宁李陶深《计算机工程》2005,31(6):52-54

研究了层次式多数据库中事务执行的正确性问题.给出了层次式多数据库的定义和结构以及建立在其上的事务结构,根据多数据的特点提出了一种层次式多数据库中事务执行正确性准则,并举例说明其应用,最后给出了该标准的评价以及应用展望. 相似文献

17.

PVM应用移植到MPI问题的探讨

朱建秋周丽娟《计算机工程与应用》1999,35(3):27-29

消息传递方式是广泛应用于一些并行机,特别是分布存储并行机的一种模式。ＰＶＭ（ＰａｒａｌｌｅｌＶｉｒｔｕａｌＭａｃｈｉｎｅ）和ＭＰＩ（ＭｅｓｓａｇｅＰａｓｓｉｎｇＩｎｔｅｒｆａｃｅ）都是目前是广受欢迎的基于消息传递的并行程序库,其中ＰＶＭ的消息传递接口,因其简单性,而没有给用户最大的灵活性以实现最佳的性能：为此,消息传递标准的讨论会工作组制定了消息传递接口ＭＰＩ标准,为ＰＶＭ实现最佳性能提供了可能。该文通过对ＰＶＭ和ＭＰＩ的比较,指出了从ＰＶＭ应用移植到ＭＰＩ应用时有利的方面和潜在的缺陷。如果一个应用程序能避开这些缺陷的影响,那么它就能够从移植中提高通信的性能,从而提高其分布式计算的性能。相似文献

18.

基于LAM/MPI的并行计算方法

张敏李春强马琪《计算机与现代化》2006,(4)

介绍了并行计算的设计标准之一MPI,并深入研究了MPI的一个实现版本LAM,研究LAM的目的是为了在VLSI设计中运用LAM来加快矩阵的运算速度.LAM/MPI是对MPI标准的一个高质量的运用和实现,提供了在不同平台上的高性能运行. 相似文献

19.

Performance Evaluation of Mixed-Mode OpenMP/MPI Implementations

J. Mark Bull James Enright Xu Guo Chris Maynard Fiona Reid 《International journal of parallel programming》2010,38(5-6):396-417

With the current prevalence of multi-core processors in HPC architectures mixed-mode programming, using both MPI and OpenMP in the same application, is seen as an important technique for achieving high levels of scalability. As there are few standard benchmarks written in this paradigm, it is difficult to assess the likely performance of such programs. To help address this, we examine the performance of mixed-mode OpenMP/MPI on a number of popular HPC architectures, using a synthetic benchmark suite and two large-scale applications. We find performance characteristics which differ significantly between implementations, and which highlight possible areas for improvement, especially when multiple OpenMP threads communicate simultaneously via MPI. 相似文献

20.

MPI环境下MPE图形功能的分析与扩展

罗秋明王梅李晶《计算机工程与应用》2006,42(19):87-89

在集群并行计算过程中常常需要将计算结果图形化输出,虽然在MPICH软件包中的MPE包含了基本的并行图形输出功能,但是由于它提供的绘图功能极其简单,不能满足有关图像处理等方面应用的输出结果显示,因此文章通过对MPE图形功能的复杂核心代码加以分析并完成扩展MPE库,为一些复杂图形功能增加新图形库函数来完成输出任务,并减轻编程难度提高效率,这个扩展方法同样适用于其它领域的MPI并行应用程序的结果可视化。相似文献