首页 | 本学科首页   官方微博 | 高级检索  
     

一种支持容错的任务并行程序设计模型
引用本文:王一拙,陈旭,计卫星,苏岩,王小军,石峰.一种支持容错的任务并行程序设计模型[J].软件学报,2016,27(7):1789-1804.
作者姓名:王一拙  陈旭  计卫星  苏岩  王小军  石峰
作者单位:北京理工大学 计算机学院, 北京 100081,北京理工大学 计算机学院, 北京 100081,北京理工大学 计算机学院, 北京 100081,北京理工大学 计算机学院, 北京 100081,北京理工大学 计算机学院, 北京 100081,北京理工大学 计算机学院, 北京 100081
基金项目:国家自然科学基金(61300011)
摘    要:任务并行程序设计模型已成为并行程序设计的主流,其通过发掘任务并行性来提高并行计算机的系统性能.提出一种支持容错的任务并行程序设计模型,将容错技术融入到任务并行程序设计模型中,在保证性能的同时提高系统可靠性.该模型以任务为调度、执行、错误检测与恢复的基本单位,在应用级实现容错支持.采用一种Buffer-Commit计算模型支持瞬时错误的检测与恢复;采用应用级无盘检查点实现节点故障类型永久错误的恢复;采用一种支持容错的工作窃取任务调度策略获得动态负载均衡.实验结果表明,该模型以较低的性能开销提供了对硬件错误的容错支持.

关 键 词:并行程序设计  容错  任务并行  工作窃取调度  负载均衡
收稿时间:2014/12/31 0:00:00
修稿时间:3/2/2015 12:00:00 AM

Task-Based Parallel Programming Model Supporting Fault Tolerance
WANG Yi-Zhuo,CHEN Xu,JI Wei-Xing,SU Yan,WANG Xiao-Jun and SHI Feng.Task-Based Parallel Programming Model Supporting Fault Tolerance[J].Journal of Software,2016,27(7):1789-1804.
Authors:WANG Yi-Zhuo  CHEN Xu  JI Wei-Xing  SU Yan  WANG Xiao-Jun and SHI Feng
Affiliation:School of Computer Science, Beijing Institute of Technology, Beijing 100081, China,School of Computer Science, Beijing Institute of Technology, Beijing 100081, China,School of Computer Science, Beijing Institute of Technology, Beijing 100081, China,School of Computer Science, Beijing Institute of Technology, Beijing 100081, China,School of Computer Science, Beijing Institute of Technology, Beijing 100081, China and School of Computer Science, Beijing Institute of Technology, Beijing 100081, China
Abstract:Task-Based parallel programming model has become the mainstream parallel programming model to improve the performance of parallel computer systems by exploiting task parallelism. This paper presents a novel task-based parallel programming model which supports hardware fault tolerance. This model incorporates fault tolerance mechanisms into the task-based parallel programming model and aim to improve system performance and reliability. It uses task as the basic unit of scheduling, execution, fault detection and recovery, and supports fault tolerance in the application level. A buffer-commit computation model is used for transient fault tolerance and application-level diskless checkpointing technique is employed for permanent fault tolerance. A work-stealing scheduling scheme supporting fault tolerance is adopted to achieve dynamic load balancing. Experimental results show that the proposed model provides hardware fault tolerance with low performance overhead.
Keywords:parallel programming  fault tolerance  task parallelism  work-stealing scheduling  load balancing
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号