DAFT: Decoupled Acyclic Fault Tolerance |
| |
Authors: | Yun Zhang Jae W Lee Nick P Johnson David I August |
| |
Affiliation: | 1. Department of Computer Science, Princeton University, 35 Olden St., Princeton, NJ, 08540, USA
|
| |
Abstract: | Higher transistor counts, lower voltage levels, and reduced noise margin increase the susceptibility of multicore processors
to transient faults. Redundant hardware modules can detect such faults, but software techniques are more appealing for their
low cost and flexibility. Recent software proposals have not achieved widespread acceptance because they either increase register
pressure, double memory usage, or are too slow in the absence of hardware extensions. This paper presents DAFT, a fast, safe,
and memory efficient transient fault detection framework for commodity multicore systems. DAFT replicates computation across
multiple cores and schedules fault detection off the critical path. Where possible, values are speculated to be correct and
only communicated to the redundant thread at essential program points. DAFT is implemented in the LLVM compiler framework
and evaluated using SPEC CPU2000 and SPEC CPU2006 benchmarks on a commodity multicore system. Evaluation results demonstrate
that speculation allows DAFT to improves the performance of software redundant multithreading by 2.17× with no degradation
of fault coverage. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|