首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 7 毫秒
1.
MPI+OpenMP混合并行编程模型应用研究   总被引:13,自引:0,他引:13  
多处理器结点集群在高性能计算市场上日趋流行,如何在多处理器上编写出高效的并行代码成为研究的热点。MPI+OpenMP为多处理器结点集群提供了一种有效的并行策略,结点内部共享内存空间编程模式适合 OpenMP并行,消息传递模型MPI被用在集群的结点与结点之间,这样就实现了并行的层次结构化。  相似文献   

2.
3.
基于SMP集群的MPI+OpenMP混合编程模型研究   总被引:4,自引:1,他引:3  
讨论了MPI+OpenMP混合编程模型的特点及其实现方法。建立了对拉普拉斯偏微分方程求解的混合并行算法,并在HL-2A高性能计算系统上同纯MPI算法作了性能方面的比较。结果表明,该混合并行算法具有更好的扩展性和加速比。  相似文献   

4.
普通Kriging方法是进行空间降水插值的一种有效方法。然而一方面由于海量数据插值计算量大,另一方面该算法的时间复杂度大,为减少空间降水插值的计算时间,采用OpenMP和MPI混合并行技术,实现Kriging并行算法。在Windows操作系统上搭建并行计算环境,实验数据表明,该并行算法能有效地节省计算时间。  相似文献   

5.
When using a shared memory multiprocessor, the programmer faces the issue of selecting the portable programming model which will provide the best performance. Even if they restricts their choice to the standard programming environments (MPI and OpenMP), they have to select a programming approach among MPI and the variety of OpenMP programming styles. To help the programmer in their decision, we compare MPI with three OpenMP programming styles (loop level, loop level with large parallel sections, SPMD) using a subset of the NAS benchmark (CG, MG, FT, LU), two dataset sizes (A and B), and two shared memory multiprocessors (IBM SP3 NightHawk II, SGI Origin 3800). We have developed the first SPMD OpenMP version of the NAS benchmark and gathered other OpenMP versions from independent sources (PBN, SDSC and RWCP). Experimental results demonstrate that OpenMP provides competitive performance compared with MPI for a large set of experimental conditions. Not surprisingly, the two best OpenMP versions are those requiring the strongest programming effort. MPI still provides the best performance under some conditions. We present breakdowns of the execution times and measurements of hardware performance counters to explain the performance differences. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

6.
针对当前搭建集群并行系统复杂且耗时等问题,提出基于Docker搭建并行系统。介绍轻量级虚拟化技术Docker的核心概念和基本架构,并基于Docker技术在Linux平台上搭建集群并行开发环境。简要阐述并行计算的思想,叙述MPI和OpenMP并行计算的基本概念和特点,针对矩阵并行乘法的算法建立MPI和OpenMP的混合编程模型,并给出混合编程模型与MPI并行编程模型以及OpenMP并行编程模型的性能对比,分析出现差异的原因。基于该混合编程模型比较Docker与传统物理机两者搭建的并行系统的并行效率。  相似文献   

7.
The purpose of this paper is to investigate the scalability and performance of seven, simple OpenMP test programs and to compare their performance with equivalent MPI programs on an SGI Origin 2000. Data distribution directives were used to make sure that the OpenMP implementation had the same data distribution as the MPI implementation. For the matrix‐times‐vector (test 5) and the matrix‐times‐matrix (test 7) tests, the syntax allowed in OpenMP 1.1 does not allow OpenMP compilers to be able to generate efficient code since the reduction clause is not currently allowed for arrays. (This problem is corrected in OpenMP 2.0.) For the remaining five tests, the OpenMP version performed and scaled significantly better than the corresponding MPI implementation, except for the right shift test (test 2) for a small message. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

8.
为有效监控红外弱小目标运动的全过程,必须采用多个波段同时探测,但是多波段探测必然带来计算时间的大幅增长,无法满足实际应用中对目标检测实时性的要求。针对这一问题,本文提出一种基于MPI+OpenMP的层次化并行方法,充分利用消息传递模型和共享存储模型的优势,并基于多处理器节点集群进行测试。实验结果表明,该并行程序在保证相同的检测概率的情况下加速比达到8.61,极大地提高了目标检测的效率。  相似文献   

9.
阐述MPI与OpenMP进行并行计算的特点,并在Visual Studio 2010上构建一个基于两者的混合编程平台。程序在该平台上执行时能够同时实现多进程与进程内多线程编程,设计并实现一种基于数据划分的矩阵乘法的并行算法,将数据分解为两部分交给两个计算节点分别完成,并在每个计算节点内将数据进一步划分,交给多个线程同时执行。通过与非并行矩阵乘法、MPI矩阵乘法、OpenMP矩阵乘法运算性能进行比较,验证该算法可以有效地挖掘计算机的处理能力。  相似文献   

10.
针对对称逐步超松驰预处理共轭梯度(Symmetric Successive Over Relaxation Preconditioned Conjugate Gradient,SSOR-PCG)法并行化时每步迭代都要并行求解2个三角方程组的困难,采用多色排序技术提高并行度,基于MPI+OpenMP混合编程模型开发适合于分布共享内存计算机的并行程序,通过测试选择有效的MPI通信函数,并给出3种避免共享数据竞争的措施,供不同规模问题和不同内存容量计算机情况选用.  相似文献   

11.
Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters   总被引:2,自引:0,他引:2  
Nowadays, NVIDIA's CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we propose a parallel programming approach using hybrid CUDA OpenMP, and MPI programming, which partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node.  相似文献   

12.
Important components of molecular modeling applications are estimation and minimization of the internal energy of a molecule. For macromolecules such as proteins and amino acids, energy estimation is performed using empirical equations known as force fields. Over the past several decades, much effort has been directed towards improving the accuracy of these equations, and the resulting increased accuracy has come at the expense of greater computational complexity. For example, the interactions between a protein and surrounding water molecules have been modeled with improved accuracy using the generalized Born solvation model, which increases the computational complexity to O (n 3). Fortunately, many force-field calculations are amenable to parallel execution. This paper describes the steps that were required to transform the Born calculation from a serial program into a parallel program suitable for parallel execution in both the OpenMP and MPI environments. Measurements of the parallel performance on a symmetric multiprocessor reveal that the Born calculation scales well for up to 144 processors. In some cases the OpenMP implementation scales better than the MPI implementation, but in other cases the MPI implementation scales better than the OpenMP implementation. However, in all cases the OpenMP implementation performs better than the MPI implementation, and requires less programming effort as well. Trademark Legend Sun, Sun Microsystems, SPARC, UltraSPARC, Sun Fire, Sun Performance Library and Sun HPC Cluster Tools are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.  相似文献   

13.
针对国内外强电磁脉冲耦合效应分析现状及问题,阐述了电磁软杀伤概念,提出了基于共享存储OpenMP标准与消息传递MPI的时域有限差分-通用电路仿真程序FDTD-SPICE并行同步算法,建立了在电磁脉冲激励作用下的动态耦合模型,解决了飞机电磁软杀伤动态评估的难题;以某军机为例,开展了电磁武器的座舱耦合分布电磁场和FDTD-SPICE等效电路的并行同步仿真试验与防护评估分析.试验结果表明,该算法并行同步效果良好,军机在通电状态下,军机在通电状态下,幅度最值大大高于不通电条件下的幅度最值,易损性增强,防护效果减弱;防护座舱能大大减轻电磁武器对机载电路的影响.  相似文献   

14.
The parallelization of heuristic methods allows the researchers both to explore the solution space more extensively and to accelerate the search process. Nowadays, there is an increasing interest on developing parallel algorithms using standard software components that take advantage of modern microprocessors including several processing cores with local and shared cache memories. The aim of this paper is to show it is possible to parallelize algorithms included in computational software using standard software libraries in low-cost multi-core systems, instead of using expensive high-performance systems or supercomputers. In particular, it is analyzed the benefits provided by master-worker and island parallel models, implemented with MPI and OpenMP software libraries, to parallelize population-based meta-heuristics. The capacitated vehicle routing problem with hard time windows (VRPTW) has been used to evaluate the performance of these parallel strategies. The empirical results for a set of Solomon's benchmarks show that the parallel approaches executed on a multi-core processor produce better solutions than the sequential algorithm with respect to both the quality of the solutions obtained and the runtime required to get them. Both MPI and OpenMP parallel implementations are able to obtain better or at least equal solutions (in terms of distance traveled) than the best known ones for the considered benchmark instances.  相似文献   

15.
计算机集群技术已经引起了石油地球物理界的广泛关注,如何将现有地震数据处理模块快速、高效地移植到集群上已成为地震数据处理需要解决的重大课题。本文将现有的基于消息传递(MPI)的并行地震处理模型与共享存储(OpenMP)模型相结合,实现了一个适合于SMP集群的并行地震数据支撑库,将涉及到消息传递的并行地震数据操作以直观的API的形式提供给开发人员。本文利用支撑库提供的API开发了一些测试模块。实验证明,支撑库可支持现有地震数据处理和显示模块的多种并行计算模型,并且能够获得较高的并行加速比和计算效率。  相似文献   

16.
网格生成是计算流体力学中非常重要的一环,大规模数值模拟过程中对网格精度要求的提高会导致网格生成所耗的时间增加。文中基于OpenFoam开源软件中的网格生成算法,主要研究多面体网格的并行生成,并提出OpenMP和MPI混合并行的多面体网格生成方法。通过理论分析得到,使用混合并行方法生成相同质量的网格时,混合并行方法生成网格的时间消耗随着线程数量和网格单元数量的增加而减少。3组使用不同求解器的数值模拟实验结果表明,该混合并行方法不但可以保证生成网格的质量——可以正常进行数值计算模拟且模拟结果与原方法相比几乎没有差别,而且生成同样质量与数量网格的耗时最多可以缩短至未使用OpenMP并行方法之耗时的1/4以内。  相似文献   

17.
This paper discusses the cloud computing based approach for parallelization of large displacement stability analysis of orthotropic prismatic shell structures with simply supported boundary conditions along the diaphragm-supported edges. We review the harmonic coupled finite strip method (HCFSM), and describe a software system for nonlinear analysis of reinforced concrete (RC) structures. We combine different parallelization models – MPI and OpenMP – in order to cope with the increased computational complexity, which originates from coupling of all series terms in the HCFSM formulation. We discuss the effects of parallelization from the perspective of a cloud environment. Our results show that rational usage of cloud resources can lead to significant performance improvements and monetary savings. In certain cases, the achieved performance can be very close to the maximum one.  相似文献   

18.
基于Petri网的工作流建模与正确性分析   总被引:7,自引:0,他引:7  
目前用于工作流建模和分析的工具很多,Petri网以其坚实的数学基础、直观的图形表示受到大家的青昧。本文介绍了如何使用WF-Net建立工作流模型,并根据Aalst给出的WF-Net正确性定义提出了一个算法,用来检查该WF-Net的正确性。  相似文献   

19.
针对非规则应用的OpenMP制导扩展   总被引:1,自引:0,他引:1  
许多非规则应用的棱心是稀疏矩阵运算.稀疏矩阵运算的特点是对一个数组元素的引用依赖于另两个数组的元素值,因此具有非规则访存特点.本文针对稀疏矩阵运算特点,提出一种新的OpenMP制导子句indirect,并在机群OpenMP系统OpenMP/JIAJIA上进行了实现.采用一个实的OpenMP应用Equake进行了测试,测试结果表明该制导扩展很有效,对于直接使用该制导子句的函数代码,其性能改进了18%,而整个应用的性能改进了15%.  相似文献   

20.
This work is devoted to the development of efficient parallel algorithms for the direct numerical simulation (DNS) of incompressible flows on modern supercomputers. In doing so, a Poisson equation needs to be solved at each time-step to project the velocity field onto a divergence-free space. Due to the non-local nature of its solution, this elliptic system is the part of the algorithm that is most difficult to parallelize. The Poisson solver presented here is restricted to problems with one uniform periodic direction. It is a combination of a block preconditioned Conjugate Gradient (PCG) and an FFT diagonalization. The latter decomposes the original system into a set of mutually independent 2D systems that are solved by means of the PCG algorithm. For the most ill-conditioned systems, that correspond to the lowest Fourier frequencies, the PCG is replaced by a direct Schur-complement based solver.The previous version of the Poisson solver was conceived for single-core (also dual-core) processors and therefore, the distributed memory model with message-passing interface (MPI) was used. The irruption of multi-core architectures motivated the use of a two-level hybrid MPI + OpenMP parallelization with the shared memory model on the second level. Advantages and implementation details for the additional OpenMP parallelization are presented and discussed in this paper. Numerical experiments show that, within its range of efficient scalability, the previous MPI-only parallelization is slightly outperformed by the MPI + OpenMP approach. But more importantly, the hybrid parallelization has allowed to significantly extend the range of efficient scalability. Here, the solver has been successfully tested up to 12800 CPU cores for meshes with up to 109 grid points. However, estimations based on the presented results show that this range can be potentially stretched up until 200,000 cores approximately. Finally, several examples of DNS simulations are briefly presented to illustrate some potential applications of the solver.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号