首页 | 本学科首页   官方微博 | 高级检索  
     


A Network-Failure-Tolerant Message-Passing System for Terascale Clusters
Authors:Graham  Richard L  Choi  Sung-Eun  Daniel  David J  Desai  Nehal N  Minnich  Ronald G  Rasmussen  Craig E  Risinger  L Dean  Sukalski  Mitchel W
Affiliation:(1) Advanced Computing Laboratory, MS-B287, Los Alamos National Laboratory, Los Alamos, New Mexico, 87545
Abstract:The Los Alamos Message Passing Interface (LA-MPI) is an end-to-end network-failure-tolerant message-passing system designed for terascale clusters. LA-MPI is a standard-compliant implementation of MPI designed to tolerate network-related failures including I/O bus errors, network card errors, and wire-transmission errors. This paper details the distinguishing features of LA-MPI, including support for concurrent use of multiple types of network interface, and reliable message transmission utilizing multiple network paths and routes between a given source and destination. In addition, performance measurements on production-grade platforms are presented.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号