期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

鲁宏伟肖永玲《计算机科学》2003,30(4):33-35

1.引言面向对象程序设计方法是当今最有前途的软件设计技术之一。面向对象方法是与现实模型相对应的,而现实模型中的对象是并发活动的,因此面向对象方法被认为具有潜在的并发性。将面向对象技术和并发技术结合起来的并发面向对象技术是近几年才兴起的,是一个比较新的研究领域。近年来,国内外提出了许多并发面向对象模型,文[1]提出了Actor模型,在该模型中,对象被称为actor,它是自含的、交互的和独相似文献

2.

并行程序设计环境的研究 总被引：1，自引：0，他引：1

汤雷潘雪峰刘智珺《数字社区&智能家居》2006,(26)

MPI(MessagePassingInterface)是目前一种比较著名的应用于并行环境的消息传递标准。MPICH是MPI1.2标准的一个完全实现,也是应用范围最广的一种并行及分布式环境。MPICH除包含MPI函数库之外,还包含了一套程序设计以及运行环境。本文将简要介绍如何应用MPICH的Windows版本,建立一个基于Windows的并行程序设计及运行环境。相似文献

3.

The problem with threads 总被引：5，自引：0，他引：5

Lee E.A. 《Computer》2006,39(5):33-42

For concurrent programming to become mainstream, we must discard threads as a programming model. Nondeterminism should be judiciously and carefully introduced where needed, and it should be explicit in programs. In general-purpose software engineering practice, we have reached a point where one approach to concurrent programming dominates all others namely, threads, sequential processes that share memory. They represent a key concurrency model supported by modern computers, programming languages, and operating systems. In scientific computing, where performance requirements have long demanded concurrent programming, data-parallel language extensions and message-passing libraries such as PVM, MPI, and OpenMP dominate over threads for concurrent programming. Computer architectures intended for scientific computing often differ significantly from so-called general-purpose architectures. 相似文献

4.

Scaling to a million cores and beyond: Using light-weight simulation to understand the challenges ahead on the road to exascale

《Future Generation Computer Systems》2014

As supercomputers scale to 1000 PFlop/s over the next decade, investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices for high-performance computing (HPC) hardware/software co-design is crucial. This paper summarizes recent efforts in designing and implementing a novel HPC hardware/software co-design toolkit. The presented Extreme-scale Simulator (xSim) permits running an HPC application in a controlled environment with millions of concurrent execution threads while observing its performance in a simulated extreme-scale HPC system using architectural models and virtual timing. This paper demonstrates the capabilities and usefulness of the xSim performance investigation toolkit, such as its scalability to 2²⁷ simulated Message Passing Interface (MPI) ranks on 960 real processor cores, the capability to evaluate the performance of different MPI collective communication algorithms, and the ability to evaluate the performance of a basic Monte Carlo application with different architectural parameters. 相似文献

5.

An implementation and evaluation of the MPI 3.0 one‐sided communication interface

James Dinan Pavan Balaji Darius Buntinas David Goodell William Gropp Rajeev Thakur 《Concurrency and Computation》2016,28(17):4385-4404

The Message Passing Interface (MPI) 3.0 standard includes a significant revision to MPI's remote memory access (RMA) interface, which provides support for one‐sided communication. MPI‐3 RMA is expected to greatly enhance the usability and performance of MPI RMA. We present the first complete implementation of MPI‐3 RMA and document implementation techniques and performance optimization opportunities enabled by the new interface. Our implementation targets messaging‐based networks and is publicly available in the latest release of the MPICH MPI implementation. Using this implementation, we explore the performance impact of new MPI‐3 functionality and semantics. Results indicate that the MPI‐3 RMA interface provides significant advantages over the MPI‐2 interface by enabling increased communication concurrency through relaxed semantics in the interface and additional routines that provide new window types, synchronization modes, and atomic operations. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

6.

Comparing the performance of MPICH with Cray's MPI and with SGI's MPI

Glenn R. Luecke Marina Kraeva Lili Ju 《Concurrency and Computation》2003,15(9):779-802

The purpose of this paper is to compare the performance of MPICH with the vendor Message Passing Interface (MPI) on a Cray T3E‐900 and an SGI Origin 3000. Seven basic communication tests which include basic point‐to‐point and collective MPI communication routines were chosen to represent commonly‐used communication patterns. Cray's MPI performed better (and sometimes significantly better) than Mississippi State University's (MSU's) MPICH for small and medium messages. They both performed about the same for large messages, however for three tests MSU's MPICH was about 20% faster than Cray's MPI. SGI's MPI performed and scaled better (and sometimes significantly better) than MPICH for all messages, except for the scatter test where MPICH outperformed SGI's MPI for 1 kbyte messages. The poor scalability of MPICH on the Origin 3000 suggests there may be scalability problems with MPICH. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献

7.

对MPICH作业提交方法的改进和验证

奚诚杨开济马允胜《计算机应用与软件》2008,25(6):190-192

MPI(Message Passing Interface)是大规模集群和网格平台中最通用的编程环境,而MPICH是其中应用得最广的一种可移植的实现.在集群式系统中,通信时间取决于许多因素,如节点数、网络带宽、拓扑结构还有软件算法等.到目前为止关于程序层面上的通信模式被研究得很多,以期达到提高通信效率的目的,但是MPICH系统内部所需要的通信时间特别是作业提交过程所花费的时间往往为人们所忽略.分析了当前MPICH的作业提交方法,并提出了同步二叉树法、异步二叉树法和二倍扩散法等一系列改进算法,达到了减少通信时间,优化通信性能的目的. 相似文献

8.

Towards Scalable Java HPC with Hybrid and Native Communication Devices in MPJ Express

Ansar Javed Bibrak Qamar Mohsan Jameel Aamir Shafi Bryan Carpenter 《International journal of parallel programming》2016,44(6):1142-1172

MPJ Express is a messaging system that allows application developers to parallelize their compute-intensive sequential Java codes on High Performance Computing clusters and multicore processors. In this paper, we extend MPJ Express software to provide two new communication devices. The first device—called hybrid—enables MPJ Express to exploit hybrid parallelism on cluster of multicore processors by sitting on top of existing shared memory and network communication devices. The second device—called native—uses JNI wrappers in interfacing MPJ Express to native MPI implementations like MPICH and Open MPI. We evaluate performance of these devices on a range of interconnects including 1G/10G Ethernet, 10G Myrinet and 40G InfiniBand. In addition, we analyze and evaluate the cost of MPJ Express buffering layer and compare it with the performance numbers of other Java MPI libraries. Our performance evaluation reveals that the native device allows MPJ Express to achieve comparable performance to native MPI libraries—for latency and bandwidth of point-to-point and collective communications—which is a significant gain in performance compared to existing communication devices. The hybrid communication device—without any modifications at application level—also helps parallel applications achieve better speedups and scalability by exploiting multicore architecture. Our performance evaluation quantifies the cost incurred by buffering and its impact on overall performance of software. We witnessed comparative performance as both new devices improve application performance and achieve upto 90 % of the theoretical bandwidth available without application rewriting effort—including NAS Parallel Benchmarks, point-to-point and collective communication. 相似文献

9.

KD60集群消息传递接口群集通信算法优化

郑启龙汪睿周寰《计算机应用》2011,31(6):1453-1457

大规模集群已经发展到多核的时代,多核架构对并行计算提出了新的要求。消息传递接口(MPI)是最常用的并行编程模型,而群集通信又是MPI中的重要组成部分。研究高效的群集通信算法对并行计算效率的提升有着重要的作用。KD60平台是采用首款国产多核芯片——龙芯3号搭建的国产万亿次多核集群。首先分析了KD60平台多核集群的体系特征以及多核架构下通信具有的层次性特征;然后分析原有群集通信算法实现原理及其不足;最后以广播为例,在原有算法基础上,采用一种基于片上多核(CMP)架构改进算法,改变原有算法通信模式,同时结合实验平台KD60体系特征,对算法做了体系相关优化。实验结果表明,改进算法能够很好地利用多核结构的特点,提高了群集通信广播算法的性能。相似文献

10.

MPICH2-CMEX:可扩展消息传递接口实现技术

下载免费PDF全文

谢旻卢宇彤周恩强《计算机工程与应用》2008,44(2):123-125

在大规模并行计算系统中,为了更有效地利用系统的并行性,实现一个高性能、可扩展的MPI系统是非常重要的。CMEX是无连接模式的用户级通讯软件接口,提供了高性能的报文传输和RDMA通讯操作,MPICH2-CMEX是基于CMEX的MPI实现,结合RDMA读和RDMA写通讯操作的特性,MPICH2-CMEX实现了多种数据传输通道,并利用并行应用的近邻通讯模式,实现了混合通道数据传输方法,实际的应用测试表明,MPICH2-CMEX系统具有良好的性能和可扩展性。相似文献

11.

Gauss: A Framework for Verifying Scientific Computing Software

Robert Palmer Steve Barrus Yu Yang Ganesh Gopalakrishnan Robert M. Kirby 《Electronic Notes in Theoretical Computer Science》2006,144(3):95

High performance scientific computing software is of critical international importance as it supports scientific explorations and engineering. Software development in this area is highly challenging owing to the use of parallel/distributed programming methods and complex communication and synchronization libraries. There is very little use of formal methods to debug software in this area, given that the scientific computing community and the formal methods community have not traditionally worked together. The Utah Gauss project combines expertise from scientific computing and formal methods in addressing this problem. We currently focus on MPI programs which are the kind that run on over 60% of world's supercomputers. These are programs written in C / C++ / FORTRAN employing message passing concurrency supported by the Message Passing Interface (MPI) library. Large-scale MPI programs also employ shared memory threads to manage concurrency within smaller task sub-groups, capitalizing on the recent availability of small-scale (e.g. single-chip) shared memory multiprocessors; such mixed programming styles can result in additional bugs. MPI libraries themselves can be buggy as they strive to implement complex requirements employing aggressive techniques such as multi-threading. We have built a model extractor that extracts from MPI C programs a formal model consisting of communicating processes represented in Microsoft's Zing modeling language. MPI library functions are also being modeled in Zing. This allows us to run formal analysis on the models to detect bugs in the MPI programs being analyzed. Our preliminary results and future plans are described; in addition, our contribution is to expose the special needs of this area and suggest specific avenues for problem- driven advances in software model-checking applied to scientific computing software development and verification. 相似文献

12.

An integrated fine-grain runtime system for MPI

Humaira Kamal Alan Wagner 《Computing》2014,96(4):293-309

Fine-grain MPI (FG-MPI) extends the execution model of MPI to allow for interleaved execution of multiple concurrent MPI processes inside an OS-process. It provides a runtime that is integrated into the MPICH2 middleware and uses light-weight coroutines to implement an MPI-aware scheduler. In this paper we describe the FG-MPI runtime system and discuss the main design issues in its implementation. FG-MPI enables expression of function-level parallelism, which along with a runtime scheduler, can be used to simplify MPI programming and achieve performance without adding complexity to the program. As an example, we use FG-MPI to re-structure a typical use of non-blocking communication and show that the integrated scheduler relieves the programmer from scheduling computation and communication inside the application and brings the performance part outside of the program specification into the runtime. 相似文献

13.

Message-passing environments for metacomputing

Matthias A. Brune Graham E. Fagg Michael M. Resch 《Future Generation Computer Systems》1999,15(5-6):699-712

In this paper, we present the three libraries PACX-MPI, PLUS, and PVMPI that provide message-passing between different high-performance computers in metacomputing environments. Each library supports the development and execution of distributed metacomputer applications.

The PACX-MPI approach offers a transparent interface for the communication between two or more MPI environments. PVAMPI allows the user spawning parallel processes under the MPI environment. The PLUS protocol bridges the gap between vendor-specific (e.g., MPL, NX, and PARIX) and vendor-independent message-passing environments (e.g., PVM and MPI). Moreover, it offers the ability to create and control processes at application runtime. 相似文献

14.

VI architecture communication features and performance on the Giganet cluster LAN

《Future Generation Computer Systems》2002,18(3):421-433

The virtual interface (VI) architecture standard was developed to satisfy the need for a high throughput, low latency communication system required for cluster computing. VI architecture aims to close the performance gap between the bandwidths and latencies provided by the communication hardware and visible to the application, respectively, by minimizing the software overhead on the critical path of the communication. This paper presents the results of a performance study of one VI architecture hardware implementation, the Giganet cLAN (cluster LAN). The focus of the study is to assess and compare the performance of different VI architecture data transfer modes and specific features that are available to higher-level communication software like MPI in order to aid the implementor to decide which VI architecture options to employ for various communication scenarios. Examples of such options include the use of send/receive vs. RDMA data transfers, polling vs. blocking to check completion of communication operations, multiple VIs, completion queues and scatter capabilities of VI architecture. 相似文献

15.

High-performance message-passing over generic Ethernet hardware with Open-MX

Brice Goglin 《Parallel Computing》2011,37(2):85-100

In the last decade, cluster computing has become the most popular high-performance computing architecture. Although numerous technological innovations have been proposed to improve the interconnection of nodes, many clusters still rely on commodity Ethernet hardware to implement message-passing within parallel applications. We present Open-MX, an open-source message-passing stack over generic Ethernet. It offers the same abilities as the specialized Myrinet Express stack, without requiring dedicated support from the networking hardware. Open-MX works transparently in the most popular MPI implementations through its MX interface compatibility. It also enables interoperability between hosts running the specialized MX stack and generic Ethernet hosts. We detail how Open-MX copes with the inherent limitations of the Ethernet hardware to satisfy the requirements of message-passing by applying an innovative copy offload model. Combined with a careful tuning of the fabric and of the MX wire protocol, Open-MX achieves better performance than TCP implementations, especially on 10 gigabit/s hardware. 相似文献

16.

Concurrent programming constructs for parallel MPI applications

Tobias Berka Giorgos Kollias Helge Hagenauer Marian Vajteršic Ananth Grama 《The Journal of supercomputing》2013,63(2):385-406

Concurrency and parallelism have long been viewed as important, but somewhat distinct concepts. While concurrency is extensively used to amortize latency (for example, in web- and database-servers, user interfaces, etc.), parallelism is traditionally used to enhance performance through execution on multiple functional units. Motivated by an evolving application mix and trends in hardware architecture, there has been a push toward integrating traditional programming models for concurrency and parallelism. Use of conventional threads APIs (POSIX, OpenMP) with messaging libraries (MPI), however, leads to significant programmability concerns, owing primarily to their disparate programming models. In this paper, we describe a novel API and associated runtime for concurrent programming, called MPI Threads (MPIT), which provides a portable and reliable abstraction of low-level threading facilities. We describe various design decisions in MPIT, their underlying motivation, and associated semantics. We provide performance measurements for our prototype implementation to quantify overheads associated with various operations. Finally, we discuss two real-world use cases: an asynchronous message queue and a parallel information retrieval system. We demonstrate that MPIT provides a versatile, low overhead programming model that can be leveraged to program large parallel ensembles. 相似文献

17.

Generalized communicators in the message passing interface

Demaine E.D. Foster I. Kesselman C. Snir M. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(6):610-616

We propose extensions to the message passing interface (MPI) that generalize the MPI communicator concept to allow multiple communication endpoints per process, dynamic creation of endpoints, and the transfer of endpoints between processes. The generalized communicator construct can be used to express a wide range of interesting communication structures, including collective communication operations involving multiple threads per process, communications between dynamically created threads or processes, and object-oriented applications in which communications are directed to specific objects. Furthermore, this enriched functionality can be provided in a manner that preserves backward compatibility with MPI. We describe the proposed extensions, illustrate their use with examples, and describe a prototype implementation in the popular MPI implementation MPICH 相似文献

18.

Formal specification of MPI 2.0: Case study in specifying a practical concurrent programming API

Guodong Li Robert Palmer 《Science of Computer Programming》2011,76(2):65-81

相似文献

19.

Shared‐memory communication approaches for an MPI message‐passing library

Boris V. Protopopov Anthony Skjellum 《Concurrency and Computation》2000,12(9):799-820

This paper discusses several approaches to designing and implementing shared‐memory communication protocol modules for the message‐passing interface (MPI) libraries, colloquially called ‘shared‐memory devices’. The authors present a new taxonomy for classifying designs for shared‐memory MPI communication devices and formulate design evaluation criteria. Using these criteria, the authors compare three existing shared‐memory devices for MPICH and choose the best one. The authors also present experimental results that support their choice. The contributions of this paper are three‐fold. First, the authors present the taxonomy for shared‐memory communication devices. Second, they show advantages and potential problems of the devices that belong to different classes of their taxonomy using the formulated design criteria. Third, they analyze communication performance of existing MPICH shared‐memory devices, discuss optimizations of their performance, and show the performance gains that these optimizations yield. MPICH is used for comparison, since it is a widely used MPI implementation. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

20.

A Speculative and Adaptive MPI Rendezvous Protocol Over RDMA-enabled Interconnects

Mohammad J. Rashti Ahmad Afsahi 《International journal of parallel programming》2009,37(2):223-246

Overlapping computation with communication is a key technique to conceal the effect of communication latency on the performance of parallel applications. Message Passing Interface (MPI) is a widely used message passing standard for high performance computing. One of the most important factors in achieving a good level of overlap is the MPI ability to make progress on outstanding communication operations. In this paper, we propose a novel speculative MPI Rendezvous protocol that uses RDMA Read and RDMA Write to effectively improve communication progress and consequently the overlap ability. Performance results based on a modified MPICH2 implementation over 10-Gigabit iWARP Ethernet reveal a significant (80–100%) improvement in receiver side overlap and progress ability. We have also observed up to 30% improvement in application wait time for some NPB applications as well as the RADIX application. For applications that do not benefit from this protocol, an adaptation mechanism is used to stop the speculation to effectively reduce the protocol overhead. 相似文献