共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
阐述MPI与OpenMP进行并行计算的特点,并在Visual Studio 2010上构建一个基于两者的混合编程平台。程序在该平台上执行时能够同时实现多进程与进程内多线程编程,设计并实现一种基于数据划分的矩阵乘法的并行算法,将数据分解为两部分交给两个计算节点分别完成,并在每个计算节点内将数据进一步划分,交给多个线程同时执行。通过与非并行矩阵乘法、MPI矩阵乘法、OpenMP矩阵乘法运算性能进行比较,验证该算法可以有效地挖掘计算机的处理能力。 相似文献
3.
Nowadays, NVIDIA's CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we propose a parallel programming approach using hybrid CUDA OpenMP, and MPI programming, which partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node. 相似文献
4.
Important components of molecular modeling applications are estimation and minimization of the internal energy of a molecule.
For macromolecules such as proteins and amino acids, energy estimation is performed using empirical equations known as force
fields. Over the past several decades, much effort has been directed towards improving the accuracy of these equations, and
the resulting increased accuracy has come at the expense of greater computational complexity. For example, the interactions
between a protein and surrounding water molecules have been modeled with improved accuracy using the generalized Born solvation
model, which increases the computational complexity to O (n
3). Fortunately, many force-field calculations are amenable to parallel execution. This paper describes the steps that were
required to transform the Born calculation from a serial program into a parallel program suitable for parallel execution in
both the OpenMP and MPI environments. Measurements of the parallel performance on a symmetric multiprocessor reveal that the
Born calculation scales well for up to 144 processors. In some cases the OpenMP implementation scales better than the MPI
implementation, but in other cases the MPI implementation scales better than the OpenMP implementation. However, in all cases
the OpenMP implementation performs better than the MPI implementation, and requires less programming effort as well.
Trademark Legend Sun, Sun Microsystems, SPARC, UltraSPARC, Sun Fire, Sun Performance Library and Sun HPC Cluster Tools are
trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. 相似文献
5.
针对国内外强电磁脉冲耦合效应分析现状及问题,阐述了电磁软杀伤概念,提出了基于共享存储OpenMP标准与消息传递MPI的时域有限差分-通用电路仿真程序FDTD-SPICE并行同步算法,建立了在电磁脉冲激励作用下的动态耦合模型,解决了飞机电磁软杀伤动态评估的难题;以某军机为例,开展了电磁武器的座舱耦合分布电磁场和FDTD-SPICE等效电路的并行同步仿真试验与防护评估分析.试验结果表明,该算法并行同步效果良好,军机在通电状态下,军机在通电状态下,幅度最值大大高于不通电条件下的幅度最值,易损性增强,防护效果减弱;防护座舱能大大减轻电磁武器对机载电路的影响. 相似文献
6.
This paper discusses the cloud computing based approach for parallelization of large displacement stability analysis of orthotropic prismatic shell structures with simply supported boundary conditions along the diaphragm-supported edges. We review the harmonic coupled finite strip method (HCFSM), and describe a software system for nonlinear analysis of reinforced concrete (RC) structures. We combine different parallelization models – MPI and OpenMP – in order to cope with the increased computational complexity, which originates from coupling of all series terms in the HCFSM formulation. We discuss the effects of parallelization from the perspective of a cloud environment. Our results show that rational usage of cloud resources can lead to significant performance improvements and monetary savings. In certain cases, the achieved performance can be very close to the maximum one. 相似文献
7.
针对非规则应用的OpenMP制导扩展 总被引:1,自引:0,他引:1
许多非规则应用的棱心是稀疏矩阵运算.稀疏矩阵运算的特点是对一个数组元素的引用依赖于另两个数组的元素值,因此具有非规则访存特点.本文针对稀疏矩阵运算特点,提出一种新的OpenMP制导子句indirect,并在机群OpenMP系统OpenMP/JIAJIA上进行了实现.采用一个实的OpenMP应用Equake进行了测试,测试结果表明该制导扩展很有效,对于直接使用该制导子句的函数代码,其性能改进了18%,而整个应用的性能改进了15%. 相似文献
8.
Hybrid MPI + OpenMP parallelization of an FFT-based 3D Poisson solver with one periodic direction 总被引:1,自引:0,他引:1
This work is devoted to the development of efficient parallel algorithms for the direct numerical simulation (DNS) of incompressible flows on modern supercomputers. In doing so, a Poisson equation needs to be solved at each time-step to project the velocity field onto a divergence-free space. Due to the non-local nature of its solution, this elliptic system is the part of the algorithm that is most difficult to parallelize. The Poisson solver presented here is restricted to problems with one uniform periodic direction. It is a combination of a block preconditioned Conjugate Gradient (PCG) and an FFT diagonalization. The latter decomposes the original system into a set of mutually independent 2D systems that are solved by means of the PCG algorithm. For the most ill-conditioned systems, that correspond to the lowest Fourier frequencies, the PCG is replaced by a direct Schur-complement based solver.The previous version of the Poisson solver was conceived for single-core (also dual-core) processors and therefore, the distributed memory model with message-passing interface (MPI) was used. The irruption of multi-core architectures motivated the use of a two-level hybrid MPI + OpenMP parallelization with the shared memory model on the second level. Advantages and implementation details for the additional OpenMP parallelization are presented and discussed in this paper. Numerical experiments show that, within its range of efficient scalability, the previous MPI-only parallelization is slightly outperformed by the MPI + OpenMP approach. But more importantly, the hybrid parallelization has allowed to significantly extend the range of efficient scalability. Here, the solver has been successfully tested up to 12800 CPU cores for meshes with up to 109 grid points. However, estimations based on the presented results show that this range can be potentially stretched up until 200,000 cores approximately. Finally, several examples of DNS simulations are briefly presented to illustrate some potential applications of the solver. 相似文献
9.
Haoqiang Jin Dennis JespersenPiyush Mehrotra Rupak BiswasLei Huang Barbara Chapman 《Parallel Computing》2011,37(9):562-575
The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems - distributed memory across nodes and shared memory with non-uniform memory access within each node - poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems - a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures. 相似文献
10.
11.
Parallel iterative solvers for finite-element methods using an OpenMP/MPI hybrid programming model on the Earth Simulator 总被引:1,自引:0,他引:1
Kengo Nakajima 《Parallel Computing》2005,31(10-12):1048
An efficient parallel iterative method for finite-element method has been developed for symmetric multiprocessor (SMP) cluster architectures with vector processors such as the Earth Simulator. The method is based on a three-level hybrid parallel programming model, including message passing for inter-SMP node communication, loop directives by OpenMP for intra-SMP node parallelization and vectorization for each processing element (PE). Simple 3D linear elastic problems with more than 2.2 × 109 DOF have been solved using 3 × 3 block ICCG(0) method with additive Schwarz domain decomposition and PDJDS/CM-RCM reordering on 176 nodes of the Earth Simulator, achieving performance of 3.80 TFLOPS. Furthermore, effect of color number in reordering has been evaluated on various types of computers. 相似文献
12.
Rocco Aversa Beniamino Di Martino Nicola Mazzocca Salvatore Venticinque 《Parallel Computing》2005,31(10-12):1034
Branch&Bound (B&B) is a technique widely used to solve combinatorial optimization problems in physics and engineering science. In this paper we show how the combined use of PVM and OpenMP libraries can be a promising approach to exploit the intrinsic parallel nature of this class of application and to obtain efficient code for hybrid computational architectures. We described how both the shared memory and the distributed memory programming models can be applied to implement the same algorithm for the inter-nodes and intra-node parallelization. Some experimental tests on a local area network (LAN) of workstations are finally discussed. 相似文献
13.
14.
为了准确分析OpenMP程序的负载均衡问题,详细分析了在同步点之间进行测量的恰当位置,定义了性能分析单元,给出了负载不均衡程度的计算公式,并提出了一种以性能分析单元为分析对象来测量OpenMP并行程序负载平衡的方法。该方法利用Opari对OpenMP源程序自动插入POMP性能监控函数,通过在相关的性能函数中插入定时器的方式,以分析单元为基本对象来收集程序的负载情况。该方法已在一个OpenMP性能分析工具中得到了实现,能够有效地帮助用户找出程序中负载不均衡的瓶颈。 相似文献
15.
为了提高新一代音视频编解码技术标准AVS的编码速度,利用OpenMP在多核处理器平台上研究并实现了AVS的GOP级、条带级,帧级和基于任务队列模型的帧级并行编码算法.对CIF格式的视频序列进行了测试,在四核处理器平台上加速比最高能达到3.82x.另外,基于任务队列模型的帧级并行算法在保持图像质量不变的基础上解决了帧级并行算法加速比偏低的缺点.实验结果表明,OpenMP是一种简单而有效的并行化编程工具,基于OpenMP的各个AVS并行编码算法与原串行算法相比,编码速度都有显著提高. 相似文献
16.
层次结构的多数据库系统中事务执行的正确性准则 总被引:1,自引:1,他引:1
研究了层次式多数据库中事务执行的正确性问题.给出了层次式多数据库的定义和结构以及建立在其上的事务结构,根据多数据的特点提出了一种层次式多数据库中事务执行正确性准则,并举例说明其应用,最后给出了该标准的评价以及应用展望. 相似文献
17.
消息传递方式是广泛应用于一些并行机,特别是分布存储并行机的一种模式。PVM(ParallelVirtualMachine)和MPI(MessagePassingInterface)都是目前是广受欢迎的基于消息传递的并行程序库,其中PVM的消息传递接口,因其简单性,而没有给用户最大的灵活性以实现最佳的性能:为此,消息传递标准的讨论会工作组制定了消息传递接口MPI标准,为PVM实现最佳性能提供了可能。该文通过对PVM和MPI的比较,指出了从PVM应用移植到MPI应用时有利的方面和潜在的缺陷。如果一个应用程序能避开这些缺陷的影响,那么它就能够从移植中提高通信的性能,从而提高其分布式计算的性能。 相似文献
18.
19.
J. Mark Bull James Enright Xu Guo Chris Maynard Fiona Reid 《International journal of parallel programming》2010,38(5-6):396-417
With the current prevalence of multi-core processors in HPC architectures mixed-mode programming, using both MPI and OpenMP in the same application, is seen as an important technique for achieving high levels of scalability. As there are few standard benchmarks written in this paradigm, it is difficult to assess the likely performance of such programs. To help address this, we examine the performance of mixed-mode OpenMP/MPI on a number of popular HPC architectures, using a synthetic benchmark suite and two large-scale applications. We find performance characteristics which differ significantly between implementations, and which highlight possible areas for improvement, especially when multiple OpenMP threads communicate simultaneously via MPI. 相似文献
20.
在集群并行计算过程中常常需要将计算结果图形化输出,虽然在MPICH软件包中的MPE包含了基本的并行图形输出功能,但是由于它提供的绘图功能极其简单,不能满足有关图像处理等方面应用的输出结果显示,因此文章通过对MPE图形功能的复杂核心代码加以分析并完成扩展MPE库,为一些复杂图形功能增加新图形库函数来完成输出任务,并减轻编程难度提高效率,这个扩展方法同样适用于其它领域的MPI并行应用程序的结果可视化。 相似文献