期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

贾明飞董渭清黄泳翔侯宗浩《计算机工程与应用》2003,39(14):126-129

针对当前存在的大量非结构化MPI程序,该文提出一种在MPI程序中实现点对点通信原语到集合通信原语转换的方法,其基本思路是:分析非结构化消息传递并行代码的内部结构,建立Diophantine不等式系统,然后用Omega库运算得到点对点通信代码段的通信模式集,再辅以数据交换分析确定对应的集合通信原语并替换。相似文献

2.

Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms

Teng Ma George Bosilca Aurelien Bouteiller Jack J. Dongarra 《Journal of Parallel and Distributed Computing》2013

Multicore Clusters, which have become the most prominent form of High Performance Computing (HPC) systems, challenge the performance of MPI applications with non-uniform memory accesses and shared cache hierarchies. Recent advances in MPI collective communications have alleviated the performance issue exposed by deep memory hierarchies by carefully considering the mapping between the collective topology and the hardware topologies, as well as the use of single-copy kernel assisted mechanisms. However, on distributed environments, a single level approach cannot encompass the extreme variations not only in bandwidth and latency capabilities, but also in the capability to support duplex communications or operate multiple concurrent copies. This calls for a collaborative approach between multiple layers of collective algorithms, dedicated to extracting the maximum degree of parallelism from the collective algorithm by consolidating the intra- and inter-node communications. 相似文献

3.

KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

Brice Goglin Stéphanie MoreaudAuthor Vitae 《Journal of Parallel and Distributed Computing》2013

The multiplication of cores in today’s architectures raises the importance of intra-node communication in modern clusters and their impact on the overall parallel application performance. Although several proposals focused on this issue in the past, there is still a need for a portable and hardware-independent solution that addresses the requirements of both point-to-point and collective MPIoperations inside shared-memory computing nodes. 相似文献

4.

基于线程的MPI通信加速器技术研究

刘志强宋君强卢风顺赵娟《计算机学报》2011,34(1):154-164

为了针对多核系统构建更高效的MPI支撑环境,文中提出了一种基于线程的MPI加速器,称作MPIActor.MPIActor是一种用于协助传统MPI库的透明中间件,用户可以在编译期选择是否在单线程MPI程序中采用该中间件.加入MPIActor后,每个节点内的MPI进程都被映射成同一进程中的多个线程,从而节点内的通信可通过轻... 相似文献

5.

基于XPDL的过程模式研究

王庆国步丰林《计算机工程》2004,30(15):80-82

通过扩充XPDL语言的语义,建立了一种过程模式的描述技术。对XPDL语义的扩充包括定义产品和资源、定义过程模式、实现描述过程模式的模板、实现描述活动的模板和定义实体的图形化语义等工作。文章对这几个方面的内容进行了描述。相似文献

6.

一种优化MPI程序性能的改进方法

柯鹏聂鑫《现代计算机》2011,(18):3-6

在分布式存储系统上,MPI已被证实是理想的并行程序设计模型。MPI是基于消息传递的并行编程模型,进程间的通信是通过调用库函数来实现的,因此MPI并行程序中,通信部分代码的效率对该并行程序的性能有直接的影响。通过用集群通信函数替代点对点通信函数以及通过派生数据类型和建立新通信域这两种方式,两次改进DNS的MPI并行程序实现,并通过实验给出一个优化MPI并行程序的一般思路与方法。相似文献

7.

一种新的MPI Allgather算法及其在万亿次机群系统上的实现与性能分析 总被引：4，自引：0，他引：4

陈靖张云泉张林波袁伟《计算机学报》2006,29(5):808-814

给出一个新的MPI Allgather算法--邻居交换算法（neighbor exchange）.提出的平均逻辑通信距离的概念和计算公式,可以有效地衡量通信的局部性.通过分析,发现在4种MPI Allgather算法中,邻居交换和环算法均具有最优的通信局部性.在万亿次机群深腾6800和曙光4000A上对4个MPI Allgather算法进行的性能测试和分析结果表明,邻居交换算法的长消息通信性能最优,中长消息通信性能不稳定,短消息通信性能次于递归倍增和Bruck算法. 相似文献

8.

A Speculative and Adaptive MPI Rendezvous Protocol Over RDMA-enabled Interconnects

Mohammad J. Rashti Ahmad Afsahi 《International journal of parallel programming》2009,37(2):223-246

Overlapping computation with communication is a key technique to conceal the effect of communication latency on the performance of parallel applications. Message Passing Interface (MPI) is a widely used message passing standard for high performance computing. One of the most important factors in achieving a good level of overlap is the MPI ability to make progress on outstanding communication operations. In this paper, we propose a novel speculative MPI Rendezvous protocol that uses RDMA Read and RDMA Write to effectively improve communication progress and consequently the overlap ability. Performance results based on a modified MPICH2 implementation over 10-Gigabit iWARP Ethernet reveal a significant (80–100%) improvement in receiver side overlap and progress ability. We have also observed up to 30% improvement in application wait time for some NPB applications as well as the RADIX application. For applications that do not benefit from this protocol, an adaptation mechanism is used to stop the speculation to effectively reduce the protocol overhead. 相似文献

9.

一种Nehalem平台上的MPI多级分段归约算法

邹金安刘志强廖蔚《小型微型计算机系统》2012,33(4):733-738

基于线程MPI环境提出一种适用于Nehalem平台长消息归约的多级分段归约算法(HSRA).HSRA考虑了Nehalem系统的体系结构特点,分处理器内归约和处理器外归约两个步骤实施节点内归约通信,在均匀分布计算负载的前提下仅需要较少的远端内存访问.首先在MPIActor的归约算法框架中设计、实现了HSRA算法,从访存角度分析了HSRA算法的开销,然后与单级分段和已有的另外三种节点内基于共享内存的归约算法进行比较;最后在真实系统上通过IMB(Intel MPI Benchmark)验证算法,实验结果表明,该算法是一种适用于在Nehalem系统中处理长消息节点内归约的高效算法. 相似文献

10.

MPI环境下MPE图形功能的分析与扩展

罗秋明王梅李晶《计算机工程与应用》2006,42(19):87-89

在集群并行计算过程中常常需要将计算结果图形化输出,虽然在MPICH软件包中的MPE包含了基本的并行图形输出功能,但是由于它提供的绘图功能极其简单,不能满足有关图像处理等方面应用的输出结果显示,因此文章通过对MPE图形功能的复杂核心代码加以分析并完成扩展MPE库,为一些复杂图形功能增加新图形库函数来完成输出任务,并减轻编程难度提高效率,这个扩展方法同样适用于其它领域的MPI并行应用程序的结果可视化。相似文献

11.

基于MPI的并行计算集群通信及应用 总被引：4，自引：0，他引：4

罗省贤李录明《计算机应用》2003,23(6):51-53

对能有效解大型稀疏矩阵方程的LSQR串行算法进行了并行化分析,并应用可移植消息传递标准MPI的集群通信机制在分布式存储并行系统上设计和实现了LSQR并行算法,该并行算法和程序在地震表层模型层析反演中得到了有效的应用。相似文献

12.

MPI网络通信模型的数值应用 总被引：3，自引：0，他引：3

曹骥袁勇《计算机工程》2003,29(16):13-15

讨论并行支撑环境MPI的并行通信性能模型,测试了点对点和组通信下的若干性能指标,归纳出这些性能指标的统计模型,以作为工程问题并行计算可行性和可扩充性评价的基础。相似文献

13.

MPI全互换通信的性能优化

罗秋明王梅雷海军张红兵《计算机工程与应用》2006,42(16):127-128,170

MPI全互换操作是集群计算机上进行仿真计算时常用的通信操作之一,用于各计算节点间交换上一步骤的中间计算结果。由于全互换通信的密集多对多通信容易产生接收端的阻塞从而增加通信延时,因此通过形成环状的多次规律且有序的通信过程来优化全互换通信操作过程,在大数据量的全互换通信中可以获得明显的性能提升。相似文献

14.

一种适合众核MPI的托管式消息模型

张立博漆锋滨卞卫峰姜小成《计算机工程与科学》2009,31(Z1)

以异构多核为特征的众核处理器已成为处理器技术的主流发展方向,如何在众核上实现高效、可用的MPI将逐渐成为一个研究热点。本文首先介绍了众核MPI的研究现状,然后结合已有的研究成果提出一种适合众核MPI的消息模型,最后对MPI在众核上的发展趋势进行了展望。相似文献

15.

公交车到站自动预报系统设计

余杨广迟振祥王建文《单片机与嵌入式系统应用》2015,(5)

设计了一套公交车到站自动预报系统,给出了该系统的总体设计方案,并对其组成及工作原理进行了阐述,着重介绍了系统硬件、软件架构及通信协议。最后,通过现场安装、调试、使用证明了该系统具有良好的工作性能和可靠性。相似文献

16.

MPI集群通信技术浅析

CHEN Yan HAO Li-rui 《数字社区&智能家居》2008,(23)

简要介绍了集群系统,指出其用于并行计算的工作原理,重点介绍MPI并行环境及其通信技术,并分析了MPI并行程序中的基本模式及其采用的通信技术。最后对构建MPI并行环境的集群系统进行了展望。相似文献

17.

一种基于MPI的并行体绘制算法 总被引：5，自引：0，他引：5

梁峰鲁强曾绍群《计算机工程》2005,31(13):171-173

介绍了基于MPI并行程序开发平台实现的一种三维重建并行处理算法。算法采用了Master-Slave并行计算模型，针对射线投射方法的特点，为减少运算时间，选择对图像空间进行任务划分的策略，并用任务池方法实现了动态负载平衡。通过对虚拟中国人女性一号(VCH-FI)的头部和脚部数据集的重建，表明该算法在任务规模和节点规模上具有较好的可扩展性。相似文献

18.

LogGPO: An accurate communication model for performance prediction of MPI programs

WenGuang Chen JiDong Zhai Jin Zhang WeiMin Zheng 《中国科学F辑(英文版)》2009,52(10):1785-1791

Message passing interface (MPI) is the de facto standard in writing parallel scientific applications on distributed memory systems. Performance prediction of MPI programs on current or future parallel systems can help to find system bottleneck or optimize programs. To effectively analyze and predict performance of a large and complex MPI program, an efficient and accurate communication model is highly needed. A series of communication models have been proposed, such as the LogP model family, which assume th... 相似文献

19.

过程控制工业以太网通信模型探讨

彭杰卢淋芗应启戛《微计算机信息》2007,23(10):34-35

提出了建立在交换式以太网和IEEE802.1Q/P技术基础上用于过程控制的以太网通信模型REPC,并进行了分析。相似文献

20.

Fat-tree routing and node ordering providing contention free traffic for MPI global collectives

Eitan Zahavi 《Journal of Parallel and Distributed Computing》2012

As the size of High Performance Computing clusters grows, so does the probability of interconnect hot spots that degrade the latency and effective bandwidth the network provides. This paper presents a solution to this scalability problem for real life constant bisectional-bandwidth fat-tree topologies. It is shown that maximal bandwidth and cut-through latency can be achieved for MPI global collective traffic. To form such a congestion-free configuration, MPI programs should utilize collective communication, MPI-node-order should be topology aware, and the packet routing should match the MPI communication patterns. First, we show that MPI collectives can be classified into unidirectional and bidirectional shifts. Using this property, we propose a scheme for congestion-free routing of the global collectives in fully and partially populated fat trees running a single job. The no-contention result is then obtained for multiple jobs running on the same fat-tree by applying some job size and placement restrictions. Simulation results of the proposed routing, MPI-node-order and communication patterns show no contention which provides a 40% throughput improvement over previously published results for all-to-all collectives. 相似文献