期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance Modeling and Evaluation of MPI

《Journal of Parallel and Distributed Computing》2001,61(2):202-223

Users of parallel machines need to have a good grasp for how different communication patterns and styles affect the performance of message-passing applications. LogGP is a simple performance model that reflects the most important parameters required to estimate the communication performance of parallel computers. The message passing interface (MPI) standard provides new opportunities for developing high performance parallel and distributed applications. In this paper, we use LogGP as a conceptual framework for evaluating the performance of MPI communications on three platforms: Cray-Research T3D, Convex Exemplar 1600SP, and a network of workstations (NOW). We develop a simple set of communication benchmarks to extract the LogGP parameters. Our objective in this is to compare the performance of MPI communication on several platforms and to identify a performance model suitable for MPI performance characterization. In particular, two problems are addressed: how LogGP quantifies MPI performance and what extra features are required for modeling MPI, and how MPI performance compare on the three computing platforms: Cray Research T3D, Convex Exemplar 1600SP, and workstations clusters. 相似文献

2.

PVM应用移植到MPI问题的探讨

朱建秋周丽娟《计算机工程与应用》1999,35(3):27-29

消息传递方式是广泛应用于一些并行机,特别是分布存储并行机的一种模式。ＰＶＭ（ＰａｒａｌｌｅｌＶｉｒｔｕａｌＭａｃｈｉｎｅ）和ＭＰＩ（ＭｅｓｓａｇｅＰａｓｓｉｎｇＩｎｔｅｒｆａｃｅ）都是目前是广受欢迎的基于消息传递的并行程序库,其中ＰＶＭ的消息传递接口,因其简单性,而没有给用户最大的灵活性以实现最佳的性能：为此,消息传递标准的讨论会工作组制定了消息传递接口ＭＰＩ标准,为ＰＶＭ实现最佳性能提供了可能。该文通过对ＰＶＭ和ＭＰＩ的比较,指出了从ＰＶＭ应用移植到ＭＰＩ应用时有利的方面和潜在的缺陷。如果一个应用程序能避开这些缺陷的影响,那么它就能够从移植中提高通信的性能,从而提高其分布式计算的性能。相似文献

3.

基于CELL宽带引擎架构的MPI研究与实现* 总被引：1，自引：0，他引：1

徐祯孙济洲于策亓大志张旭明《计算机应用研究》2010,27(7):2526-2529

研究了在CBEA上移植MPI消息传递编程模型和标准接口的可行性,并利用IBM CELL SDK 3.0实现了一组常用的MPI编程接口。实验结果表明,该组MPI接口可满足CBEA上应用开发的数据传输性能要求,并且其性能已接近现有DMA数据传输模式。该组MPI接口为CELL应用开发人员提供了一种通用编程接口解决方案。相似文献

4.

一种优化MPI程序性能的改进方法

柯鹏聂鑫《现代计算机》2011,(18):3-6

在分布式存储系统上,MPI已被证实是理想的并行程序设计模型。MPI是基于消息传递的并行编程模型,进程间的通信是通过调用库函数来实现的,因此MPI并行程序中,通信部分代码的效率对该并行程序的性能有直接的影响。通过用集群通信函数替代点对点通信函数以及通过派生数据类型和建立新通信域这两种方式,两次改进DNS的MPI并行程序实现,并通过实验给出一个优化MPI并行程序的一般思路与方法。相似文献

5.

PC机群上JIAJIA与MPI的比较 总被引：3，自引：2，他引：3

下载免费PDF全文

胡明昌史岗胡伟武唐志敏张福新《软件学报》2003,14(7):1187-1194

对JIAJIA和MPI (message passing interface)是进行了比较.JIAJIA和MPI分别代表共享存储和消息传递的编程模式.MPI显式进行数据传输,编程复杂;JIAJIA由底层维护数据一致性,并附加提供简单的消息传递函数,编程容易、灵活.JIAJIA分配共享内存时开销较大,初始化时间比MPI长.提出了一个关于并行加速比与进程数目之间关系的近似经验公式,推出JIAJIA和MPI性能差距随着进程数目的增多而增大的结论.测试结果表明,大部分应用程序的JIAJIA和MPI版本的并行性能差距不超过10%.对于通信量很小的应用程序,其JIAJIA和MPI的性能差距较小,而通信量本身较大的应用程序,其JIAJIA和MPI的性能差距主要取决于运行时产生的实际通信量. 相似文献

6.

Exploiting Distributed-Memory and Shared-Memory Parallelism on Clusters of SMPs with Data Parallel Programs

Benkner Siegfried Sipkova Viera 《International journal of parallel programming》2003,31(1):3-19

Clusters of SMPs are hybrid-parallel architectures that combine the main concepts of distributed-memory and shared-memory parallel machines. Although SMP clusters are widely used in the high performance computing community, there exists no single programming paradigm that allows exploiting the hierarchical structure of these machines. Most parallel applications deployed on SMP clusters are based on MPI, the standard API for distributed-memory parallel programming, and thus may miss a number of optimization opportunities offered by the shared memory available within SMP nodes. In this paper we present extensions to the data parallel programming language HPF and associated compilation techniques for optimizing HPF programs on clusters of SMPs. The proposed extensions enable programmers to control key aspects of distributed-memory and shared-memory parallelization at a high-level of abstraction. Based on these language extensions, a compiler can adopt a hybrid parallelization strategy which closely reflects the hierarchical structure of SMP clusters by automatically exploiting shared-memory parallelism based on OpenMP within cluster nodes and distributed-memory parallelism utilizing MPI across nodes. We describe the implementation of these features in the VFC compiler and present experimental results which show the effectiveness of these techniques. 相似文献

7.

Concurrent programming constructs for parallel MPI applications

Tobias Berka Giorgos Kollias Helge Hagenauer Marian Vajteršic Ananth Grama 《The Journal of supercomputing》2013,63(2):385-406

Concurrency and parallelism have long been viewed as important, but somewhat distinct concepts. While concurrency is extensively used to amortize latency (for example, in web- and database-servers, user interfaces, etc.), parallelism is traditionally used to enhance performance through execution on multiple functional units. Motivated by an evolving application mix and trends in hardware architecture, there has been a push toward integrating traditional programming models for concurrency and parallelism. Use of conventional threads APIs (POSIX, OpenMP) with messaging libraries (MPI), however, leads to significant programmability concerns, owing primarily to their disparate programming models. In this paper, we describe a novel API and associated runtime for concurrent programming, called MPI Threads (MPIT), which provides a portable and reliable abstraction of low-level threading facilities. We describe various design decisions in MPIT, their underlying motivation, and associated semantics. We provide performance measurements for our prototype implementation to quantify overheads associated with various operations. Finally, we discuss two real-world use cases: an asynchronous message queue and a parallel information retrieval system. We demonstrate that MPIT provides a versatile, low overhead programming model that can be leveraged to program large parallel ensembles. 相似文献

8.

Parallel programming model for the Epiphany many-core coprocessor using threaded MPI

《Microprocessors and Microsystems》2016

The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix–matrix multiplication, N-body particle interaction, five-point 2D stencil update, and 2D FFT) and highlight the importance of fast inter-core communication for the architecture. 相似文献

9.

MPI网络通信模型的数值应用 总被引：3，自引：0，他引：3

曹骥袁勇《计算机工程》2003,29(16):13-15

讨论并行支撑环境MPI的并行通信性能模型，测试了点对点和组通信下的若干性能指标，归纳出这些性能指标的统计模型，以作为工程问题并行计算可行性和可扩充性评价的基础。相似文献

10.

An integrated fine-grain runtime system for MPI

Humaira Kamal Alan Wagner 《Computing》2014,96(4):293-309

Fine-grain MPI (FG-MPI) extends the execution model of MPI to allow for interleaved execution of multiple concurrent MPI processes inside an OS-process. It provides a runtime that is integrated into the MPICH2 middleware and uses light-weight coroutines to implement an MPI-aware scheduler. In this paper we describe the FG-MPI runtime system and discuss the main design issues in its implementation. FG-MPI enables expression of function-level parallelism, which along with a runtime scheduler, can be used to simplify MPI programming and achieve performance without adding complexity to the program. As an example, we use FG-MPI to re-structure a typical use of non-blocking communication and show that the integrated scheduler relieves the programmer from scheduling computation and communication inside the application and brings the performance part outside of the program specification into the runtime. 相似文献

11.

KD60集群消息传递接口群集通信算法优化

郑启龙汪睿周寰《计算机应用》2011,31(6):1453-1457

大规模集群已经发展到多核的时代,多核架构对并行计算提出了新的要求。消息传递接口(MPI)是最常用的并行编程模型,而群集通信又是MPI中的重要组成部分。研究高效的群集通信算法对并行计算效率的提升有着重要的作用。KD60平台是采用首款国产多核芯片——龙芯3号搭建的国产万亿次多核集群。首先分析了KD60平台多核集群的体系特征以及多核架构下通信具有的层次性特征;然后分析原有群集通信算法实现原理及其不足;最后以广播为例,在原有算法基础上,采用一种基于片上多核(CMP)架构改进算法,改变原有算法通信模式,同时结合实验平台KD60体系特征,对算法做了体系相关优化。实验结果表明,改进算法能够很好地利用多核结构的特点,提高了群集通信广播算法的性能。相似文献

12.

基于Docker的MPI和OpenMP混合编程

赵博颖肖鹏张力《计算机与现代化》2018,(5):60

针对当前搭建集群并行系统复杂且耗时等问题,提出基于Docker搭建并行系统。介绍轻量级虚拟化技术Docker的核心概念和基本架构,并基于Docker技术在Linux平台上搭建集群并行开发环境。简要阐述并行计算的思想,叙述MPI和OpenMP并行计算的基本概念和特点,针对矩阵并行乘法的算法建立MPI和OpenMP的混合编程模型,并给出混合编程模型与MPI并行编程模型以及OpenMP并行编程模型的性能对比,分析出现差异的原因。基于该混合编程模型比较Docker与传统物理机两者搭建的并行系统的并行效率。相似文献

13.

MPI并行程序设计的负载平衡实现方法 总被引：1，自引：0，他引：1

陆克中林晓辉《微计算机信息》2007,23(15):226-227

MPI是目前集群系统中最重要的并行编程工具,它采用消息传递的方式实现并行程序间通信。在MPI并行程序设计中实现负载平衡有着重要的意义,可以减少运行时间,提高MPI并行程序的性能。负载平衡又可分为静态负载平衡和动态负载平衡,对于静态负载平衡,提出了一种分配任务的算法,可有效地按照节点的计算能力,在节点间分配任务;对于动态负载平衡,提出了一种在MPI并行程序中实现的方法,可有效地根据节点的负载情况,在节点间迁移任务。相似文献

14.

LogGPO: An accurate communication model for performance prediction of MPI programs

WenGuang Chen JiDong Zhai Jin Zhang WeiMin Zheng 《中国科学F辑(英文版)》2009,52(10):1785-1791

Message passing interface (MPI) is the de facto standard in writing parallel scientific applications on distributed memory systems. Performance prediction of MPI programs on current or future parallel systems can help to find system bottleneck or optimize programs. To effectively analyze and predict performance of a large and complex MPI program, an efficient and accurate communication model is highly needed. A series of communication models have been proposed, such as the LogP model family, which assume th... 相似文献

15.

Distributed media indexing based on MPI and MapReduce

Hisham Mohamed Stéphane Marchand-Maillet 《Multimedia Tools and Applications》2014,69(2):513-537

Web-scale digital assets comprise millions or billions of documents. Due to such increase, sequential algorithms cannot cope with this data, and parallel and distributed computing become the solution of choice. MapReduce is a programming model proposed by Google for scalable data processing. MapReduce is mainly applicable for data intensive algorithms. In contrast, the message passing interface (MPI) is suitable for high performance algorithms. This paper proposes an adapted structure of the MapReduce programming model using MPI for multimedia indexing. Experimental results are done on various multimedia applications to validate our model. The experiments indicate that our proposed model achieves good speedup compared to the original sequential versions, Hadoop and the earlier versions of MapReduce using MPI. 相似文献

16.

一种支持多种访存技术的CBEA片上多核MPI并行编程模型 总被引：1，自引：0，他引：1

冯国富董小社胡冰王旭昊王恩东《计算机学报》2008,31(11)

现有的CBEA(Cell Broadband Engine Architecture)编程模型多侧重于支持类似于流处理的"批量访存"(Bulk Data Transfer)应用,传统非规则访存应用性能较低.文中基于Cell架构提出了一种同时支持"批量访存"与非规则访存应用的MPI并行编程模型,将通信分解在PPE(PowerPC Processing Element)上,拓宽模型的适用范围;在统一访存接口下,通过运行时访存剖分信息指导选择和优化访存以提高计算效率.实验结果表明,文中提出的编程模型支持多种访存模式并具有很好的并行加速比,可获得较同类相关技术30%~50%左右的性能提升. 相似文献

17.

MPI+TBB混合并行编程模型在分子动力学中的应用

白明泽赵文辉豆育升孙世新温迪《计算机应用研究》2012,29(5):1772-1774

为了提高分子动力学模拟在对称多处理(SMP)集群上的计算速度,在分子动力学并行方法中引入MPI+TBB的混合并行编程模型。基于该模型,在分子动力学软件LAMMPS中设计并实现混合并行算法,在节点间采用MPI及空间分解技术实施进程级并行,节点内采用TBB及临界区技术实施线程级并行。在SMP集群中的测试表明,该方法在体系较大以及节点数较多时可以明显减少通信时间,使加速比在纯MPI模型上提高45%。结果表明,MPI+TBB混合并行编程模型可促进分子动力学并行模拟且效率明显提升。相似文献

18.

PC机群上共享存储与消息传递的比较 总被引：7，自引：0，他引：7

下载免费PDF全文

章隆兵吴少刚蔡飞胡伟武《软件学报》2004,15(6):842-849

共享存储和消息传递是目前两种主流的并行编程模型.一般认为,消息传递的可编程性不及共享存储友好.OpenMP是目前共享存储编程的实际工业标准.机群OpenMP系统在机群上提供了OpenMP编程环境,具有易编程和可扩展的特点,但是其性能如何一直是关注的热点.以机群OpenMP系统OpenMP/JIAJIA和典型的消息传递系相似文献

19.

A high-performance MPI implementation on a shared-memory vector supercomputer

《Parallel Computing》1997,22(11):1513-1526

In this article we recount the sequence of steps by which MPICH, a high-performance, portable implementation of the Message-Passing Interface (MPI) standard, was ported to the NEC SX-4, a high-performance parallel supercomputer. Each step in the sequence raised issues that are important for shared-memory programming in general and shed light on both MPICH and the SX-4. The result is a low-latency, very high bandwidth implementation of MPI for the NEC SX-4. In the process, MPICH was also improved in several general ways. 相似文献

20.

基于MPI的图像并行处理方法

孙敏《数字社区&智能家居》2009,(8)

该文阐述了用MPI实现图像并行处理的一般方法,并通过不同通信机制,通信模式,网络性能,数据大小的程序运行结果对比,得到并行程序的一般特点。相似文献