首页 | 本学科首页   官方微博 | 高级检索  
     


An I/O-efficient and adaptive fault-tolerant framework for distributed graph computations
Authors:Zhigang Wang  Yu Gu  Yubin Bao  Ge Yu  Lixin Gao
Affiliation:1.School of Computer Science and Engineering,Northeastern University,Shenyang,China;2.Department of Electrical and Computer Engineering,University of Massachusetts Amherst,Amherst,USA
Abstract:In recent year, many large-scale iterative graph computation systems such as Pregel have been developed. To ensure that these systems are fault-tolerant, checkpointing, which archives graph states onto distributed file systems periodically, has been proposed. However, fault-tolerance remains to be challenging because the whole data set is archived with a static interval, rendering underlying graph computations to entail I/O-costs in terms of disk and network communication. Motivated by this, we first propose to dynamically adjust checkpoint intervals based on a carefully designed cost-analysis model, by taking the underlying computing workload into account. Furthermore, for algorithms that can be restarted from any point during computations, we prioritize graph states and then checkpointing can be performed with selected data, instead of the entire dataset, to reduce archiving overhead while simultaneously guaranteeing the failure recovery efficiency. Finally, we conduct extensive performance studies to confirm the effectiveness of our approaches over existing up-to-date solutions using a broad spectrum of real-world graphs.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号