首页 | 本学科首页   官方微博 | 高级检索  
     


An efficient checkpointing method for multicomputers with wormhole routing
Authors:Kai Li  Jeffrey F Naughton  James S Plank
Affiliation:(1) Department of Computer Science, Princeton University, 08544 Princeton, New Jersey;(2) Department of Computer Science, University of Wisconsin at Madison, Madison, Wisconsin, 53706
Abstract:Efficient checkpointing and resumption of multicomputer applications is essential if multicomputers are to support time-sharing and the automatic resumption of jobs after a system failure. We present a checkpointing scheme that is transparent, imposes overhead only during checkpoints, requires minimal message logging, and allows for quick resumption of execution from a checkpointed image. Furthermore, the checkpointing algorithm allows each processorp to continue running the application being checkpointed except during the time thatp is actively taking a local snapshot, and requires no global stop or freeze of the multicomputer. Since checkpointing multicomputer applications poses requirements different from those posed by checkpointing general distributed systems, existing distributed checkpointing schemes are inadequate for multicomputer checkpointing. Our checkpointing scheme makes use of special properties of wormhole routing networks to satisfy this new set of requirements.
Keywords:Checkpointing  fault-tolerance  multicomputers  wormhole routing
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号