Cost-oriented proactive fault tolerance approach to high performance computing (HPC) in the cloud期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Cost-oriented proactive fault tolerance approach to high performance computing (HPC) in the cloud

Authors:	Ifeanyi P. Egwutuoha Shiping Chen David Levy Bran Selic Rafael Calvo

Affiliation:	1. School of Electrical and Information Engineering, The University of Sydney, NSW 2006, Australia;2. CSIRO, Information Engineering Laboratory, CSIRO ICT Centre, Sydney, NSW, Australia

Abstract:	Cloud computing offers new computing paradigms, capacity and flexible solutions to high performance computing (HPC) applications. For example, Hardware as a Service (HaaS) allows users to provide a large number of virtual machines (VMs) for computation-intensive applications using the HaaS model. Due to the large number of VMs and electronic components in HPC system in the cloud, any fault during the execution would result in re-running the applications, which will cost time, money and energy. In this paper we presented a proactive fault tolerance (FT) approach to HPC systems in the cloud to reduce the wall-clock execution time and dollar cost in the presence of faults. We also developed a generic FT algorithm for HPC systems in the cloud. Our algorithm does not rely on a spare node prior to prediction of a failure. We also developed a cost model for executing computation-intensive applications on HPC systems in the cloud. We analysed the dollar cost of provisioning spare nodes and checkpointing FT to assess the value of our approach. Our experimental results obtained from a real cloud execution environment show that the wall-clock execution time and cost of running computation-intensive applications in cloud can be reduced by as much as 30%. The frequency of checkpointing of computation-intensive applications can be reduced up to 50% with our FT approach for HPC in the cloud compared with current FT approaches.

Keywords:	HPC Cloud computing HaaS proactive fault tolerance computation-intensive

设为首页 | 免责声明 | 关于勤云 | 加入收藏