首页 | 本学科首页   官方微博 | 高级检索  
     


MPI-RCDD: A Framework for MPI Runtime Communication Deadlock Detection
Authors:Hong-Mei Wei  Jian Gao  Peng Qing  Kang Yu  Yan-Fei Fang  Ming-Lu Li
Affiliation:1.Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;2.Jiangnan Institute of Computing Technology, Wuxi 214083, China
Abstract:The message passing interface (MPI) has become a de facto standard for programming models of highperformance computing, but its rich and flexible interface semantics makes the program easy to generate communication deadlock, which seriously affects the usability of the system. However, the existing detection tools for MPI communication deadlock are not scalable enough to adapt to the continuous expansion of system scale. In this context, we propose a framework for MPI runtime communication deadlock detection, namely MPI-RCDD, which contains three kinds of main mechanisms. Firstly, MPI-RCDD has a message logging protocol that is associated with deadlock detection to ensure that the communication messages required for deadlock analysis are not lost. Secondly, it uses the asynchronous processing thread provided by the MPI to implement the transfer of dependencies between processes, so that multiple processes can participate in deadlock detection simultaneously, thus alleviating the performance bottleneck problem of centralized analysis. In addition, it uses an AND⊕OR model based algorithm named AODA to perform deadlock analysis work. The AODA algorithm combines the advantages of both timeout-based and dependency-based deadlock analysis approaches, and allows the processes in the timeout state to search for a deadlock circle or knot in the process of dependency transfer. Further, the AODA algorithm cannot lead to false positives and can represent the source of the deadlock accurately. The experimental results on typical MPI communication deadlock benchmarks such as Umpire Test Suit demonstrate the capability of MPIRCDD. Additionally, the experiments on the NPB benchmarks obtain the satisfying performance cost, which show that the MPI-RCDD has strong scalability.
Keywords:high-performance computing  message passing interface (MPI)  communication deadlock  deadlock detection  AND⊕OR model  
本文献已被 维普 等数据库收录!
点击此处可从《计算机科学技术学报》浏览原始摘要信息
点击此处可从《计算机科学技术学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号