首页 | 本学科首页   官方微博 | 高级检索  
     

一种降低并行程序检查点开销的方法
引用本文:周小成,孙凝晖,霍志刚,马 捷.一种降低并行程序检查点开销的方法[J].计算机工程,2007,33(12):84-86.
作者姓名:周小成  孙凝晖  霍志刚  马 捷
作者单位:[1]中国科学院研究生院,北京100080 [2]中国科学院计算技术研究所,北京100080
基金项目:中国科学院新一代机群关键技术研究基金
摘    要:检查点设置和卷回恢复是提高系统可靠性和实现容错计算的有效途径,其性能通常用开销率来评价,而检查点开销是影响开销率的主要因素。针对目前并行程序运行时存在较多通信阻塞时间的现状,该文在写时复制检查点缓存的基础上提出了一种进一步降低检查点开销的方法。通过控制状态保存线程的调度和选择合适的状态保存粒度,该方法能很好地利用通信阻塞时间隐藏状态保存线程运行时带来的开销,从而能进一步降低开销率。

关 键 词:检查点设置和卷回恢复  检查点开销  通信阻塞时间
文章编号:1000-3428(2007)12-0084-03
修稿时间:2006-06-30

Method for Reducing Checkpoint Overhead of Parallel Program
ZHOU Xiaocheng,SUN Ninghui,HUO Zhigang,MA Jie.Method for Reducing Checkpoint Overhead of Parallel Program[J].Computer Engineering,2007,33(12):84-86.
Authors:ZHOU Xiaocheng  SUN Ninghui  HUO Zhigang  MA Jie
Affiliation:1. Graduate School of Chinese Academy of Sciences, Beijing 100080; 2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080
Abstract:Checkpointing and rollback recovery is an effect way to improve system reliability and implement fault-tolerant computation. It is usually evaluated by overhead ratio, which is primarily effected by checkpoint overhead. As there is much communication blocking time while parallel program is running, a method based on copy-on-write checkpoint buffering is proposed to further reduce checkpoint overhead. By controlling the running of checkpointing thread and selecting a suitable granularity, the method can hide the overhead caused by checkpointing thread very well and thus reduce overhead ratio.
Keywords:Checkpointing and rollback recovery  Checkpoint overhead  Communication blocking time
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号