首页 | 本学科首页   官方微博 | 高级检索  
     

面向异构并行计算系统的流水线式压缩检查点
引用本文:刘勇鹏,王锋,卢凯,刘勇燕.面向异构并行计算系统的流水线式压缩检查点[J].电子学报,2012,40(2):223-229.
作者姓名:刘勇鹏  王锋  卢凯  刘勇燕
作者单位:1. 国防科学技术大学计算机学院,湖南长沙 410073;2. 中国科技部信息中心,北京 100862
基金项目:国家863高技术研究发展计划重大项目,高效能服务器和存储技术国家重点实验室开放基金,国家自然科学基金
摘    要:在大规模并行计算系统中,并行检查点触发大量结点同时保存计算状态,造成巨大文件存储空间开销,以及对通信和存储系统的巨大访问压力.数据压缩可以缩小检查点文件尺寸,从而降低存储空间开销以及对通信和存储系统的访问压力.但是,它也带来额外的压缩计算开销.本文针对异构并行计算系统,提出流水线式并行压缩检查点技术,采用一系列优化技术来降低压缩引入的计算延时,包括:流水线式双重写缓存队列、文件写操作的合并、GPU加速的流水压缩算法和GPU资源的多进程调度,等等.本文介绍了该技术在天河一号系统中的实现,并对所实现的检查点系统进行综合评测.实验数据表明该方法在大规模异构并行计算系统中是可行、高效、实用的.

关 键 词:异构并行体系结构  检查点  数据压缩  软流水线  图形处理器  
收稿时间:2010-09-20

Pipelined Compressed Checkpointing for Heterogeneous Systems
LIU Yong-peng , WANG Feng , LU Kai , LIU Yong-yan.Pipelined Compressed Checkpointing for Heterogeneous Systems[J].Acta Electronica Sinica,2012,40(2):223-229.
Authors:LIU Yong-peng  WANG Feng  LU Kai  LIU Yong-yan
Affiliation:1. College of Computer,National University of Defense Technology,Changsha,Hunan 410073,China;2. Information Center,Ministry of Science and Technology of China,Beijing 100862,China
Abstract:Checkpointing is an effective technique to improve the reliability of large scale parallel computing systems.Data compression is a promising technique to reduce the size of data to be saved in the files in the storage subsystem and the amount of data to go through the communication subsystem.However,compression causes a huge amount of time overhead.The time overhead is the main technical barrier of its practical usability.In this paper,we propose a parallel compressed checkpointing technique to reduce the time overhead of compression in heterogenous architectures.It integrates a number of optimization techniques,which include transmitting checkpointing data between host and GPU in buffered pipelines,aggregating file write operations,employing a pipelined parallel compression algorithm,and delegating compression operations to GPU,etc.The paper reports an implementation of the technique in the TH-1 system and the evaluation experiments with the system.The experiment data show that the technique is efficient and practically useable.
Keywords:heterogenous architecture  checkpoint  data compression  pipeline  graphic processing unit(GPU)
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《电子学报》浏览原始摘要信息
点击此处可从《电子学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号