Abstract: | Developing a distributed debugger is much more complex than developing a sequential debugger. This added complexity is mainly due to the non-determinism of events that communication delays introduce into distributed systems. We explore the problems that one must address when designing a distributed program debugger and then describe our design and implementation of DPD (distributed program debugger). Problems addressed include non-determinism of events, finding consistent system states, setting breakpoints, recording events, and checkpointing. Important features of DPD include dynamic roll back and replay, as well as a graphical user interface. DPD has been tested successfully in debugging distributed programs within a distributed facility called REM (remote execution manager). |