Building a global clock for observing computations in distributed memory parallel computers |
| |
Authors: | Jean-Marc Jé zé quel,Claude Jard |
| |
Abstract: | A common time reference (i.e. global clock) is needed for observing the behavior of a distributed algorithm on a distributed computing system. The paper presents a pragmatic algorithm to build a global clock on any distributed system, which is optimal for homogeneous distributed memory parallel computers (DMPCs). In order to observe and sort concurrent events in common DMPCs, we need a global clock with a resolution finer than the message transfer time variance, which is better than what deterministic and fault-tolerant algorithms can obtain. Thus a statistical method is chosen as a building block to derive an original algorithm valid for any topology. Its main originality over related approaches is to cope with the problem of clock granularity in computing frequency offsets between local clocks to achieve a resolution comparable with the resolution of the physical clocks. This algorithm is particularly well suited for debugging distributed algorithms by means of trace recordings because after its acquisition step it does not induce message overhead: the perturbation induced on the execution remains as small as possible. It has been implemented on various DMPCs: Intel iPSC/2 hypercube and Paragon XP/S, Transputer-based networks and Sun networks, so we can provide some data about its behavior and performances on these DMPCs. |
| |
Keywords: | |
|
|