期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

廖名学何晓新范植华《计算机工程》2008,34(17):274-275

静态检测MPI程序同步通信死锁比较困难,通常需要建立程序模型。顺序模型是其他所有复杂模型的基础。通过一种映射方法将顺序模型转化为字符串集合,将死锁检测问题转化为等价的多队列字符串匹配问题,从而设计并实现了一种MPI同步通信顺序模型的静态死锁检测算法。该算法的性能优于通常的环检测方法,并能适应动态消息流。相似文献

2.

Application-oriented ping-pong benchmarking: how to assess the real communication overheads

Timo Schneider Robert Gerstenberger Torsten Hoefler 《Computing》2014,96(4):279-292

Moving data between processes has often been discussed as one of the major bottlenecks in parallel computing—there is a large body of research, striving to improve communication latency and bandwidth on different networks, measured with ping-pong benchmarks of different message sizes. In practice, the data to be communicated generally originates from application data structures and needs to be serialized before communicating it over serial network channels. This serialization is often done by explicitly copying the data to communication buffers. The message passing interface (MPI) standard defines derived datatypes to allow zero-copy formulations of non-contiguous data access patterns. However, many applications still choose to implement manual pack/unpack loops, partly because they are more efficient than some MPI implementations. MPI implementers on the other hand do not have good benchmarks that represent important application access patterns. We demonstrate that the data serialization can consume up to 80 % of the total communication overhead for important applications. This indicates that most of the current research on optimizing serial network transfer times may be targeted at the smaller fraction of the communication overhead. To support the scientific community, we extracted the send/recv-buffer access patterns of a representative set of scientific applications to build a benchmark that includes serialization and communication of application data and thus reflects all communication overheads. This can be used like traditional ping-pong benchmarks to determine the holistic communication latency and bandwidth as observed by an application. It supports serialization loops in C and Fortran as well as MPI datatypes for representative application access patterns. Our benchmark, consisting of seven micro-applications, unveils significant performance discrepancies between the MPI datatype implementations of state of the art MPI implementations. Our micro-applications aim to provide a standard benchmark for MPI datatype implementations to guide optimizations similarly to the established benchmarks SPEC CPU and Livermore Loops. 相似文献

3.

Design,implementation and evaluation of a deadlock-free routing algorithm for concurrent computers

M. Cannataro G. Spezzano D. Talia E. Gallizzi 《Concurrency and Computation》1992,4(2):143-161

This paper describes the design, the implementation, and the performance results of a routing algorithm which provides deadlock-free communication in a tightly coupled message-passing concurrent computer. The algorithm is adaptive, isolated and uses the store-and-forward technique. It allows message communication between two processes regardless of where they are physically located on the network. The routing algorithm has many positive characteristics including provable deadlock freedom, guaranteed message arrival, and automatic local congestion reduction. It can be used as a basis for the design of high-level communication primitives. An Occam implementation on a network of inmos Transputers is discussed. The experimental results show that the routing algorithm is effective to support process to process communication on a concurrent computer. 相似文献

4.

Performance Modeling and Evaluation of MPI

《Journal of Parallel and Distributed Computing》2001,61(2):202-223

Users of parallel machines need to have a good grasp for how different communication patterns and styles affect the performance of message-passing applications. LogGP is a simple performance model that reflects the most important parameters required to estimate the communication performance of parallel computers. The message passing interface (MPI) standard provides new opportunities for developing high performance parallel and distributed applications. In this paper, we use LogGP as a conceptual framework for evaluating the performance of MPI communications on three platforms: Cray-Research T3D, Convex Exemplar 1600SP, and a network of workstations (NOW). We develop a simple set of communication benchmarks to extract the LogGP parameters. Our objective in this is to compare the performance of MPI communication on several platforms and to identify a performance model suitable for MPI performance characterization. In particular, two problems are addressed: how LogGP quantifies MPI performance and what extra features are required for modeling MPI, and how MPI performance compare on the three computing platforms: Cray Research T3D, Convex Exemplar 1600SP, and workstations clusters. 相似文献

5.

Generalized communicators in the message passing interface

Demaine E.D. Foster I. Kesselman C. Snir M. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(6):610-616

We propose extensions to the message passing interface (MPI) that generalize the MPI communicator concept to allow multiple communication endpoints per process, dynamic creation of endpoints, and the transfer of endpoints between processes. The generalized communicator construct can be used to express a wide range of interesting communication structures, including collective communication operations involving multiple threads per process, communications between dynamically created threads or processes, and object-oriented applications in which communications are directed to specific objects. Furthermore, this enriched functionality can be provided in a manner that preserves backward compatibility with MPI. We describe the proposed extensions, illustrate their use with examples, and describe a prototype implementation in the popular MPI implementation MPICH 相似文献

6.

A deadlock detection interface to a commercial simulation language

Murali Krishnamurthi Sanjeev Thallikar 《Computers & Industrial Engineering》1998,34(4):743-757

In this research, issues related to deadlock detection in discrete event simulation models and the feasibility of interfacing a deadlock detection algorithm to a commercial simulation language have been explored. For the purpose of this research, a deadlock detection algorithm has been designed, developed and interfaced to the SIMAN commercial simulation language. Both the algorithm and the interface have been validated using a set of sample scenarios. The details of the deadlock detection algorithm, the interface to the chosen commercial simulation language and the validation scenarios are discussed in this paper, along with the lessons learned and recommendations for future enhancements. 相似文献

7.

Deadlock detection without wait-for graphs

Dror G. Feitelson 《Parallel Computing》1991,17(12):1377-1383

Deadlock detection is an important service that the run-time system of a parallel environment should provide. In parallel programs deadlock can occur when the different processes are waiting for various events, as opposed to concurrent systems, where deadlock occurs when processes wait for resources held by other processes. Therefore classical deadlock detection techniques such as checking for cycles in the wait-for graph are unapplicable. An alternative algorithm that checks whether all the processes are blocked is presented. This algorithm deals with situations in which the state transition from blocked to unblocked is indirect, as may happen when busy-waiting is used. 相似文献

8.

面向高带宽I/O的片上网络优化

石伟龚锐刘威王蕾冯权友张剑锋《计算机工程与科学》2021,43(9):1538-1545

在高性能处理器中,I/O带宽需求不断增加,一方面高速接口的通道数目不断增加,另一方面接口传输速率也在逐渐提升.高性能处理器的片上网络必须能够匹配各种高速I/O的带宽需求,且必须保证DM A请求能够正确完成.然而各种高速接口协议与片上网络协议在通信机制上存在较大的差别,可能导致死锁等现象的产生.首先对匹配高性能I/O的片上网络存在的问题进行分析,然后提出一种高带宽I/O设计方法及死锁解决方法.采用解死锁方法的片上网络增强了I/O系统的鲁棒性,同时可以减少片上网络设计及运行时的各种限制,提升I/O性能.最后,将所提出的优化方法应用到高性能服务器处理器芯片中,并进行评测,针对16通道PCIe 4.0接口,双向读写带宽分别达到30 GB/s,在一些特殊场景出现死锁以后,片上网络能自动检测死锁并解除死锁. 相似文献

9.

Local Distributed Deadlock Detection by Cycle Detection and Clusterng

《IEEE transactions on pattern analysis and machine intelligence》1987,(1):3-14

A distributed algorithm for the detection of deadlocks in store-and-forward communication networks is presented. At first, we focus on a static environment and develop an efficient knot detection algorithm for general graphs. The knot detection algorithm uses at most O(n²+ m) messages and O(log (n)) bits of memory to detect all deadlocked nodes in the static network. Using the knot detection algorithm as a building block, a deadlock detection algorithm in a dynamic environment is developed. This algorithm has the following properties: It detects all the nodes which cause the deadlock. The algorithm is triggered only when there is a potential for deadlock and only those nodes which are potentially deadlocked perform the algorithm. The algorithm does not affect other processes at the nodes. 相似文献

10.

基于Petri网并行程序通信死锁的检测和预防

下载免费PDF全文

崔焕庆刘强《计算机工程》2008,34(23):50-52

无死锁是并行程序正确性的主要条件之一,已有研究成果关注于死锁检测,但对死锁预防研究较少。该文在对消息传递模式并行程序各种通信过程进行分类介绍的基础上,借助Petri网进行建模,提出程序死锁与Petri网死标识的对应关系,给出通信死锁检测算法,进而针对2种引起通信死锁的原因提出了3种预防方法,通过比较提出最佳方案。该方法既有较好的通用性,又可用于并行算法设计阶段的死锁预防以提高并行编程效率。相似文献

11.

基于SUIF2的静态死锁检测方法研究

郝闯张志祥张静波《计算机与数字工程》2012,40(7):69-72

死锁是并发程序中常见的错误之一,且由于并发程序运行的不确定性使得死锁难以检测。针对该问题,通过对C多线程程序死锁的分析,提出了一种基于SUIF2的静态死锁检测方法,设计了基于SUIF2的C多线程程序静态死锁检测的框架结构和锁集分析算法。最后通过一个实例说明了该检测方法的有效性。相似文献

12.

容错的分布式系统通用死锁模型检测解除算法

程欣刘宏伟董剑杨孝宗《计算机研究与发展》2007,44(5):798-805

分布式系统技术为采用低成本购建高性能系统提供了有效的途径,但是由于资源的分配与需求可能产生冲突,造成系统中发生死锁,导致系统运行陷入停滞.在不可靠的分布式系统中,故障会干扰正常的死锁检测,但现有的死锁检测算法不具有容错功能.对失效形式进行了归类,提出一个容错的死锁检测解除算法.算法建立在通用的AND-OR模型基础上,采用扩散计算和集中规约方式,不仅能够检测到死锁,而且能给出死锁环的全部成员.若死锁拓扑处于静态且为环状,算法的消息复杂度的上限为e n-1,时间复杂度为d,其中e为死锁等待图中边的个数,n和d为构成死锁环的节点的个数,分析表明算法性能等于或优于同类算法. 相似文献

13.

集群计算环境下高性能通信库HPCL的设计与实现

胡长军李佳尹怡欣王珏《小型微型计算机系统》2006,27(12):2196-2200

论述了一个高性能通信库系统-HPCL的研究与实现．HPCL的技术特点包括支持多路通道、允许多个进程同时通信、支持可靠的消息传递、支持通道上下文切换、允许一个通道被多个进程按照分时的方式来使用、支持多点传送，具有低延迟，高吞吐量等．这些特点的实现技术包括改进的ACK／NACK流控制算法，基于快速Ethernet网络接口的及时发送（Immediate Sending）技术，中断回收（Interrupt Reaping）技术．使用典型Benchmark对HPCL和TCP／IP的通信性能、应用性能进行对比表明，HPCL达到了理想的吞吐率和时间延迟．相似文献

14.

Optimization and Performance of a Fortran 90 MPI-Based Unstructured Code on Large-Scale Parallel Systems

Shires Dale Mohan Ram 《The Journal of supercomputing》2003,25(2):131-141

The message-passing interface (MPI) has become the standard in achieving effective results when using the message passing paradigm of parallelization. Codes written using MPI are extremely portable and are applicable to both clusters and massively parallel computing platforms. Since MPI uses the single program, multiple data (SPMD) approach to parallelism, good performance requires careful tuning of the serial code as well as careful data and control flow analysis to limit communication. We discuss optimization strategies used and their degree of success to increase performance of an MPI-based unstructured finite element simulation code written in Fortran 90. We discuss performance results based on implementations using several modern massively parallel computing platforms including the SGI Origin 3800, IBM Nighthawk 2 SMP, and Cray T3E-1200. 相似文献

15.

三维Navier-Stokes方程分步法的并行算法在异构平台上实现初探

徐莹徐磊姜恺《计算机工程与科学》2012,34(9):33-39

本文选取了三维不可压缩流动方程的分步法(fractional-step method),其中动量方程使用BiCGSTAB算法进行迭代求解,而压力泊松方程使用Fourier变换法进行直接求解。本文研究该算法在集群平台上的并行算法,从区域分解入手,分析一维、两维、三维区域划分三种情况下,各并行处理器上的计算量与通讯量,根据分析结果使用两维区域分解。分析BiCGSTAB算法和泊松Fourier变换法在GPGPU异构平台上的移植方法。最后,本文分析了BiCGSTAB和泊松方程Fourier变换法两种算法在CPU集群和GPGPU异构平台上的并行性能结果。相似文献

16.

Hybrid MPI-thread parallelization of adaptive mesh operations

《Parallel Computing》2016

Many of the world’s leading supercomputer architectures are a hybrid of shared memory and network-distributed memory. Such an architecture lends itself to a hybrid MPI-thread programming model. We first present an implementation of inter-thread message passing based on the MPI and pthread libraries. In addition, we present an efficient implementation of termination detection for communication rounds. We use the term phased message passing to denote the communication interface based on this termination detection. This interface is then used to implement parallel operations for adaptive unstructured meshes, and the performance of resulting applications is compared to pure MPI operation. We also present new workflows enabled by the ability to vary the number of threads during runtime. 相似文献

17.

基于MPI的并行八叉树碰撞检测 总被引：6，自引：1，他引：5

刘晓平曹力《计算机辅助设计与图形学学报》2007,19(2):184-187,192

通过对碰撞检测过程进行分析,发现各节点间相关性较小,存在并行化的可能.在对八叉树碰撞检测算法做适当修改的基础上,结合成熟的消息传递通信(MPI)并行编程环境,提出了基于MPI的并行碰撞检测算法.测试结果表明,碰撞检测效率有较大的提高. 相似文献

18.

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory

Torsten Hoefler James Dinan Darius Buntinas Pavan Balaji Brian Barrett Ron Brightwell William Gropp Vivek Kale Rajeev Thakur 《Computing》2013,95(12):1121-1136

Hybrid parallel programming with the message passing interface (MPI) for internode communication in conjunction with a shared-memory programming model to manage intranode parallelism has become a dominant approach to scalable parallel programming. While this model provides a great deal of flexibility and performance potential, it saddles programmers with the complexity of utilizing two parallel programming systems in the same application. We introduce an MPI-integrated shared-memory programming model that is incorporated into MPI through a small extension to the one-sided communication interface. We discuss the integration of this interface with the MPI 3.0 one-sided semantics and describe solutions for providing portable and efficient data sharing, atomic operations, and memory consistency. We describe an implementation of the new interface in the MPICH2 and Open MPI implementations and demonstrate an average performance improvement of 40 % to the communication component of a five-point stencil solver. 相似文献

19.

PVM应用移植到MPI问题的探讨

朱建秋周丽娟《计算机工程与应用》1999,35(3):27-29

消息传递方式是广泛应用于一些并行机,特别是分布存储并行机的一种模式。ＰＶＭ（ＰａｒａｌｌｅｌＶｉｒｔｕａｌＭａｃｈｉｎｅ）和ＭＰＩ（ＭｅｓｓａｇｅＰａｓｓｉｎｇＩｎｔｅｒｆａｃｅ）都是目前是广受欢迎的基于消息传递的并行程序库,其中ＰＶＭ的消息传递接口,因其简单性,而没有给用户最大的灵活性以实现最佳的性能：为此,消息传递标准的讨论会工作组制定了消息传递接口ＭＰＩ标准,为ＰＶＭ实现最佳性能提供了可能。该文通过对ＰＶＭ和ＭＰＩ的比较,指出了从ＰＶＭ应用移植到ＭＰＩ应用时有利的方面和潜在的缺陷。如果一个应用程序能避开这些缺陷的影响,那么它就能够从移植中提高通信的性能,从而提高其分布式计算的性能。相似文献

20.

一种基于依赖分析的并发程序潜在死锁检测算法 总被引：1，自引：0，他引：1

卢超卢炎生谢晓东赵小松《小型微型计算机系统》2007,28(5):841-844

死锁是并发程序特有的一种运行时错误,由于并发程序在执行时的不确定性,死锁的检测和定位是非常困难的.本文提出了一种基于依赖分析的并发程序潜在死锁检测算法,该算法是一种静态分析算法,能检测并发程序中是否存在潜在死锁,并能定位死锁发生时各线程可能被挂起的语句节点.本文给出了算法的形式化定义和时间复杂度分析,实验测试结果表明算法是正确且有效的. 相似文献