首页 | 本学科首页   官方微博 | 高级检索  
     

异构系统的异步应用级Checkpointing技术
引用本文:贾佳.异构系统的异步应用级Checkpointing技术[J].计算机工程与科学,2011,33(11).
作者姓名:贾佳
作者单位:并行与分布处理国防科技重点实验室,湖南长沙,410073
基金项目:国家自然科学基金资助项目(60921062,61003087)
摘    要:应用级checkpointing技术是同构系统上最为常用和成熟的容错技术,但在异构系统下的应用还处于起步阶段,还没有一套严谨合理的针对异构系统架构和故障模型特点的实现方案和配置方法。针对这一现况,本文基于CUDA异构系统的体系结构和编程模型,对CUDA程序在CPU和GPU上的执行模式进行分析,提出了一种面向异构系统应用级checkpointing技术的异步执行机制,并基于这一机制对异构系统的检查点优化设置问题进行讨论,设计了一套优化方案。最后在CUDA平台下通过三个实例验证了这一技术的可行性和实用性,并进行了性能评估。结果表明,这种面向CPU-GPU的异构系统的应用级checkpointing异步执行机制是行之有效的,相比CPU-GPU同步执行的checkpointing机制在设置上更为灵活,优化空间更大。而本文基于这一机制所提出的检查点优化设置方法也有效地减少了check-pointing的开销,从而获得了更高的容错性能。

关 键 词:应用级checkpointing技术  异构系统  异步执行机制  检查点最优化设置

Asynchronous Application-Level Checkpointing in Heterogeneous Systems
JIA Jia.Asynchronous Application-Level Checkpointing in Heterogeneous Systems[J].Computer Engineering & Science,2011,33(11).
Authors:JIA Jia
Affiliation:JIA Jia (National Laboratory for Parallel and Distributed Processing,Changsha 410073,China)
Abstract:The application-level checkpointing technique is one of the most commonly used and well matured fault-tolerance techniques in homogenous systems. However, It is on its infant phase in heterogeneous systems and there are not accurate and reasonable solutions or approaches with respect to architectures and fault models of heterogeneous systems. Motivated by this observation, based on the architecture and programming model of the CUDA heterogeneous system, this paper analyzes the execution mode of CUDA program...
Keywords:application-level checkpointing  heterogeneous system  asynchronous execution mechanism  optimal placement of checkpoints  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号