首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于高性能集群计算系统的检查点策略
引用本文:隋翠翠,晏海华.一种基于高性能集群计算系统的检查点策略[J].微电子学与计算机,2008,25(10).
作者姓名:隋翠翠  晏海华
作者单位:北京航空航天大学计算机学院,北京,100191
摘    要:为了提高高性能集群计算系统的容错能力,检查点设置成为一种广泛采用的手段.目前检查点设置多采用的协调式设置协议,该协议在集群规模扩展情况下,同步操作造成巨大的系统时间开销,并阻塞正常计算的执行.针对该问题,使用非协调式检查点设置协议消除同步操作,采用消息日志记录方式保证系统状态一致性,并利用线程后台执行方式达到透明性设置.最后,通过典型的系统实验,验证了该方法的有效性,并进行同协调式协议设置的时间开销对比.

关 键 词:检查点  容错  集群系统  非阻塞协议

Checkpointing Strategy for High Performance Computing Cluster System
SUI Cui-cui,YAN Hai-hua.Checkpointing Strategy for High Performance Computing Cluster System[J].Microelectronics & Computer,2008,25(10).
Authors:SUI Cui-cui  YAN Hai-hua
Abstract:Checkpointing is an effective way of improving the reliability of cluster system. It's a research hot spot how to lower the system spending of checkpoint. At present, the study of checkpointing points that the rollback recover of un-coordinated checkpointing is difficult to realize,and has less appilication in cluster system. This paper will compare existing schema use non-blocking checkpoint and reduce synchronization operation, so that system spending will be cut down, and the efficiency will be improved.
Keywords:checkpointing  fault tolerance  cluster system  un-coordinated protocol
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号