首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 218 毫秒
1.
在当前存在的大量非结构化MPI程序中,许多基于点对点通信原语的代码段可以替换为相应的结构化集合通信原语,在MPI代码结构分析的基础上提出了一种MPI程序中点对点通信原语到集合通信原语转换的方法,首先分析非结构化MPI代码的内部结构,建立Diophantine不等式系统,然后用Omega库运算得到点对点通信代码段的通信模式集,再辅以数据交换分析确定对应的集合通信原语并替换,并给出了相应的实例分析。  相似文献   

2.
杨浩  王越男 《计算机仿真》2020,37(4):173-177
针对传统点对点通信原语并行转换方法无法集中分析内部数据结构,导致整体转换效果较差的问题,提出一种MPI程序下点对点通信原语并行转换方法。分析当前原语代码数据结构,完成对应结构化操作,基于并行解码的上行数据交换数据,根据数据理论分析获取数据节点冲突概率,引入高密度MDSCAN聚类算法实现符号的数据簇分类,利用Omega数据库的运算通信模式转换原语,实现通信原语转换为原语数据集。实验结果表明,研究方法的原语数据集抗压比和数据贴合度明显提高,数据显著性更好,转换效果更理想。  相似文献   

3.
将MPI(Message Passing Interface)进程拓扑有效地映射到处理器拓扑上有助于提高MPI程序的通信性能。目前大部分的MPI进程映射只考虑点对点通信,很少考虑到集合通信,原因是获取集合通信的进程拓扑是比较困难的。目前大部分剖析(profiling)工具在剖析集合通信时只考虑了函数的接口语义,而忽视了实现语义,导致这些工具不能正确地获取集合通信进程之间的详细通信情况。本文提出了一套剖析算法,可以准确地计算出参与集合通信的每对进程之间的通信量,并以通信矩阵的形式给出进程拓扑。实验证明了剖析算法的正确性,并且通过这种剖析方法获取的进程拓扑能够提升进程到处理器核的映射实验效果。  相似文献   

4.
针对现有通信优化算法无法使MPI自动并行化编译器生成加速比理想的消息传递程序问题,提出了一种基于重排序变换和循环分布的通信优化算法。该算法根据给出的过程间副作用集合和基于mpi_wait/mpi_irecv移动的重排序变换规则,有序地采用重排序变换和循环分布,尽可能安全地扩大点到点非阻塞通信中通信与计算的重叠窗口,使MPI自动并行化编译器生成具有更多计算重叠通信的消息传递代码。实验结果表明,该算法能够隐藏更多的点到点非阻塞通信开销,并且明显提升消息传递程序的加速比。  相似文献   

5.
根据卫星网络拓扑动态变化、节点能力受限以及卫星节点自身高速运动的特点,为减少网络管理时延,降低管理站和星上网管代理间的通信负载,提出基于合同网模型的卫星网络管理簇生成算法,设计并实现卫星网络管理簇通信原语,该原语遵循ASN.1标准。仿真结果表明,该原语为卫星网络管理簇的建立提供了支持。  相似文献   

6.
集合通信性能是影响并行程序并行效率的重要因素之一,但对于大规模并行计算机上不同类别集合通信的评测和理论分析仍较为缺乏,许多应用程序的通信模块设计和使用不合理。基于某国产并行机平台,利用IMB测试程序,对各典型MPI(message passing interface)集合通信性能进行了分析,并基于现有通信模型和算法进行理论拟合。结果显示:不同类别的MPI集合通信操作的性能差异很大,并且许多集合通信的性能在超大规模下与理论差距很大。一方面反映出现有理论和模型的不足;另一方面也体现出,无论是集合通信的优化,还是基于集合通信的特征进行应用程序的通信模块设计,仍然大有可为。  相似文献   

7.
静态检测MPI程序同步通信死锁比较困难,通常需要建立程序模型。顺序模型是其他所有复杂模型的基础。通过一种映射方法将顺序模型转化为字符串集合,将死锁检测问题转化为等价的多队列字符串匹配问题,从而设计并实现了一种MPI同步通信顺序模型的静态死锁检测算法。该算法的性能优于通常的环检测方法,并能适应动态消息流。  相似文献   

8.
Alltoall是一种重要的MPI(message passing interface)集合通信类别,是影响许多并行程序并行效率的重要因素。但对于大规模并行计算机上Alltoall集合通信的评测和理论分析仍较为缺乏,导致许多应用程序的通信模块设计和使用不合理。首先,开展了MPI基本通信性能的测试和分析,发现随着MPI进程数的增加,其性能波动也增加,而这种波动源自网络竞争。为此,在传统的Alltoall性能评估模型中引入了网络竞争因素,新模型不仅考虑传统的通信带宽和通信延迟参数,还考虑了通信竞争因素。某国产并行机平台上的测试结果显示:引入网络竞争模型的新Alltoall性能评估模型可以较为准确地预估Alltoall性能,体现出网络竞争开销对Alltoall性能的影响。  相似文献   

9.
MPI的3.0版新增了非阻塞集合通信.非阻塞集合通信兼顾非阻塞和集合通信的特点,与阻塞集合通信相比具有更低的同步开销,能够实现更多的计算通信重叠,带来性能提升.以广播为例详细介绍了广播通信的不同算法实现,比较了非阻塞与阻塞广播底层控制管理方法并进行了实验分析,提出了实现改进方法.  相似文献   

10.
基于二维/轴对称高精度可压缩多相流计算流体力学方法 MuSiC-CCASSIM的结构化网格部分,设计了区域并行分解方法;针对各处理器边界数据的通信,设计了阻塞式通信与非阻塞式通信并行算法;为了减少通信开销,设计了MPI/OpenMP混合并行优化算法。在天河二号超级计算机上进行了测试,每个核固定网格规模为625*250,最多调用8 192核。测试数据表明,采用MPI/OpenMP混合并行算法、纯MPI非阻塞式通信并行算法和纯MPI阻塞式通信并行算法的程序的平均并行效率分别达到86%、83%和77%,三种算法都具有良好的可扩展性。  相似文献   

11.
This paper deals with a technique that can support the re-engineering of parallel programs based on point-to-point communication primitives by detecting typical process interaction patterns in the code. Pattern detection is performed by the static analysis of the parallel program and by solving Diophantine sets of inequalities. The objective is to determine process interactions and to classify them into a set of commonly occurring interaction patterns.

Information on the patterns contained in the program, besides being useful for code comprehension and documentation, makes it possible to obtain more structured and, possibly, efficient versions of the same programs through the use of collective communication constructs. These are primitives for collective data movement or computation often available in current message-passing programming environments.

After the presentation of the basic program analysis technique, several examples involving the detection of common communication patterns are shown. Then the structure of PPAR, a prototype tool that allows the analysis of parallel programs written in Fortran 77 with calls to PVM or MPI unstructured communication primitives is outlined, and conclusions are drawn.  相似文献   


12.
A distributed program is a collection of several processes which execute concurrently, possibly in different nodes of a distributed system, and which cooperate with each other to realize a common goal. In this paper, we present a design of communication and synchronization primitives for distributed programs. The primitives are designed such that they can be provided by a kernel of a distributed operating system. An important feature of the design is that the configuration of a process, i.e., identities of processes with which the process communicates, is specified separately from the computation performed by the process. This permits easy configuration and reconfiguration of processes. We identify different kinds of communication failures, and provide distinct mechanisms for handling them. The communication primitives are not atomic actions. To enable the construction of atomic actions, two new program components, atomic agent and manager are introduced. These are devoid of policy decisions regarding concurrency control and atomic commitment. We introduce the notion of conflicts relation using which a designer can construct either an optimistic or a pessimistic concurrency control scheme. The design also incorporates primitives for constructing nested atomic actions.  相似文献   

13.
Epsilon is a testbed for monitoring distributed applications involving heterogeneous computers, including microcomputers, interconnected by a local area network. Such a hardware configuration is usual but raises difficulties for the programmer. First, the interprocess communication mechanisms provided by the operating systems are rather cumbersome to use. Second, they are different from one system to another. Third, the programmer of distributed applications should not worry about system and/or network aspects that are not relevant for the application level. The authors present the solution chosen in Epsilon. A set of high-level communication primitives has been designed and implemented to provide the programmer with an interface independent of the operating system and of the underlying interprocess communications facilities. A program participating in a distributed application can be executed on any host without any change in the source code except for host names  相似文献   

14.
Optimizing Message Passing Interface (MPI) point-to-point communication for large messages is of paramount importance since most communications in MPI applications are performed by such operations. Remote Direct Memory Access (RDMA) allows one-sided data transfer and provides great flexibility in the design of efficient communication protocols for large messages. However, achieving high point-to-point communication performance on RDMA-enabled clusters is challenging due to both the complexity in communication protocols and the impact of the protocol invocation scenario on the performance of a given protocol. In this work, we analyze existing protocols and show that they are not ideal in many situations, and propose to use protocol customization, that is, different protocols for different situations to improve MPI performance. More specifically, by leveraging the RDMA capability, we develop a set of protocols that can provide high performance for all protocol invocation scenarios. Armed with this set of protocols that can collectively achieve high performance in all situations, we demonstrate the potential of protocol customization by developing a trace-driven toolkit that allows the appropriate protocol to be selected for each communication in an MPI application to maximize performance. We evaluate the performance of the proposed techniques using micro-benchmarks and application benchmarks. The results indicate that protocol customization can out-perform traditional communication schemes by a large degree in many situations.  相似文献   

15.
We have performed benchmarks of two three-dimensional parallel Particle-In-Cell (PIC) codes that are similar but have quite different communication patterns on different computational Grids. An electrostatic code with only electrons based on the three-dimensional skeleton PIC code employs the FFT Poisson solver that uses collective communication patterns. Another is the TRISTAN (TRI-dimensional STATNford) code parallelized with MPI, an electromagnetic full particle code, which uses a field solver that only requires point-to-point neighbor communication patterns. We present the mpptest benchmarks on cluster-based computational Grids, where both the basic point-to-point communication patterns and the basic collective communication patterns used in these PIC codes are tested. The results of these benchmarks clearly allow us to quantify and understand the scalability of both communication patterns on the Grids. The present results show that the parallelized TRISTAN code (without all-to-all collective communication) is more scalable than the parallelized skeleton PIC code (with all-to-all collective communication), in cluster-based computational Grid systems where communication performances is poor.  相似文献   

16.
We introduce a compact hierarchical procedural model that combines feature‐based primitives to describe complex terrains with varying level of detail. Our model is inspired by skeletal implicit surfaces and defines the terrain elevation function by using a construction tree. Leaves represent terrain features and they are generic parametrized skeletal primitives, such as mountains, ridges, valleys, rivers, lakes or roads. Inner nodes combine the leaves and subtrees by carving, blending or warping operators. The elevation of the terrain at a given point is evaluated by traversing the tree and by combining the contributions of the primitives. The definition of the tree leaves and operators guarantees that the resulting elevation function is Lipschitz, which speeds up the sphere tracing used to render the terrain. Our model is compact and allows for the creation of large terrains with a high level o detail using a reduced set of primitives. We show the creation of different kinds of landscapes and demonstrate that our model allows to efficiently control the shape and distribution of landform features.  相似文献   

17.
We study, from the expressiveness point of view, the impact of synchrony in the communication primitives that arise when combining together some common and useful programming features like arity of data, communication medium and possibility of pattern matching. For some primitives, we show how their synchronous version can be encoded in their asynchronous counterpart via a fully abstract encoding, thus proving that the two versions have the same expressive power. For the remaining primitives, we prove that no ‘reasonable’ encoding can exist, thus proving that synchrony adds expressiveness to the language.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号