首页 | 本学科首页   官方微博 | 高级检索  
     


Clustered checkpointing: Maximizing the level of confidence for non-equidistant checkpointing
Affiliation:1. Department of Electrical and Computer Engineering, University of California at Riverside, Riverside, CA 92521, USA;2. Mentor Graphics Corporation, Fremont, CA 94528, USA;3. Department of Computer Science and Engineering, University of California at Riverside, Riverside, CA 92521, USA;1. Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran;2. Department of Informatics, Technical University of Munich, Munich, Germany;3. E-JUST Center, Graduate School of Information Science and Electrical Eng., Kyushu University, Fukuoka, Japan;1. Department of Information Engineering, Electronics and Telecommunications (D.I.E.T.), Sapienza University of Rome, via Eudossiana 18, 00184 Rome, Italy;2. Center for Life Nano Science@Sapienza, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161 Rome, Italy;3. Division of Health Protection Technologies, ENEA, via Anguillarese 301, 00123 Rome, Italy;1. Malaysia-Japan International Institute of Technology, Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, 54100 Kuala Lumpur, Malaysia;2. Integrated Circuit Engineering, 14387, Taman Paik Siong, Batu 71/2, Jalan Puchong, 47180 Puchong, Selangor Darul Ehsan, Malaysia;3. Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan;1. Electronics Laboratory, Physics Department, University of Patras, Patras, 26504,Greece;2. Department of Electrical Engineering, Technological Educational Institute of Western Greece, Patras, 26334, Greece
Abstract:Employing fault tolerance often introduces a time overhead, which may cause a deadline violation in real-time systems (RTS). Therefore, for RTS it is important to optimize the fault tolerance techniques such that the probability to meet the deadlines, i.e. the Level of Confidence (LoC), is maximized. Previous studies have focused on evaluating the LoC for equidistant checkpointing. However, no studies have addressed the problem of evaluating the LoC for non-equidistant checkpointing. In this work, we provide an expression to evaluate the LoC for non-equidistant checkpointing. Further, we detail an exhaustive search approach to find the distribution of a given number of checkpoints that results in the maximal LoC. Since the exhaustive search approach is very time-consuming, we propose the Clustered Checkpointing method, a heuristic that distributes checkpoints in a number of clusters with the goal to maximize the LoC. The results show that the LoC can be improved when non-equidistant checkpointing is used. Further, the results indicate that the proposed Clustered Checkpointing method is capable to find the distribution that results in the maximal LoC in much shorter time than the exhaustive search approach, while considering only few clusters.
Keywords:Fault tolerance  Reliability analysis  Real-time systems  Checkpointing
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号