首页 | 本学科首页   官方微博 | 高级检索  
     


Transparent three-phase Byzantine fault tolerance for parallel and distributed simulations
Affiliation:1. Institute of High Performance Computing, 1 Fusionopolis Way, Singapore 138632, Singapore;2. Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore;1. State Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Science, Beijing, China;2. Zhejiang & CAS Application Center for Geoinformatics, Zhejiang, China;3. Key Laboratory of Plant Genetics and Molecular Breeding, Zhoukou Normal University, Henan, China;4. School of Management, Xinxiang University, Henan, China;1. State Key Laboratory of Fire Science, University of Science and Technology of China, Hefei 230027, People’s Republic of China;2. Department of Architectural and Civil Engineering, City University of Hong Kong, Hong Kong, China;3. Research Center for Crisis & Hazard Management, Wuhan University of Technology, Wuhan 430070, People’s Republic of China;1. State Key Laboratory of Software Development Environment, Beihang University, Beijing, PR China;2. School of Computer Science and Engineering, Beihang University, Beijing, PR China;3. School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA;1. Cologne University of Applied Sciences, Lab Control Engineering and Mechatronics, Betzdorfer Str. 2, 50679 Köln, Germany;2. VDEh-Betriebsforschungsinstitut GmbH, Division Measurement and Automation, Sohnstr. 65, 40237 Düsseldorf, Germany;1. IBM Research, Melbourne, Australia;2. The University of Melbourne, Melbourne, Australia;1. Chalmers University of Technology, Department of Materials and Manufacturing Technology, SE412 96 Gothenburg, Sweden;2. GKN Aerospace Engine Systems, Research and Technology Centre, 46181 Trollhättan, Sweden;3. AB Sandvik Coromant, R&D Materials and Processes, SE126 80 Stockholm, Sweden
Abstract:A parallel and distributed simulation (federation) is composed of a number of simulation components (federates). Since the federates may be developed by different participants and executed on different platforms, they are subject to Byzantine failures. Moreover, the failure may propagate in the federation, resulting in epidemic effect. In this article, a three-phase (i.e., detection, location, and recovery) Byzantine Fault Tolerance (BFT) mechanism is proposed based on a transparent middleware approach. The replication, checkpointing and message logging techniques are integrated in the mechanism for the purpose of enhancing simulation performance and reducing fault tolerance cost. In addition, mechanisms are provided to remove the epidemic effects of Byzantine failures. Our experiments have verified the correctness of the three-phase BFT mechanism and illustrated its high efficiency and good scalability. For some simulation executions, the BFT mechanism may even achieve performance enhancement and Byzantine fault tolerance simultaneously.
Keywords:Parallel and distributed simulation  Byzantine fault tolerance  Replication  Checkpoint  Epidemic effect  Time synchronization
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号