Check out the new USENIX Web site.


USENIX, The Advanced Computing Systems Association

2006 USENIX Annual Technical Conference Abstract

Pp. 289–300 of the Proceedings

Awarded Best Paper!

Replay Debugging for Distributed Applications

Dennis Geels, Gautam Altekar, Scott Shenker, and Ion Stoica, University of California, Berkeley

Abstract

We have developed a new replay debugging tool, liblog, for distributed C/C++ applications. It logs the execution of deployed application processes and replays them deterministically, faithfully reproducing race conditions and non-deterministic failures, enabling careful offline analysis.

To our knowledge, liblog is the first replay tool to address the requirements of large distributed systems: lightweight support for long-running programs, consistent replay of arbitrary subsets of application nodes, and operation in a mixed environment of logging and non-logging processes. In addition, it requires no special hardware or kernel patches, supports unmodified application executables, and integrates GDB into the replay mechanism for simultaneous source-level debugging of multiple processes.

This paper presents liblog's design, an evaluation of its runtime overhead, and a discussion of our experience with the tool to date.
  • View the full text of this paper in HTML and PDF. Listen to the presentation in MP3 format.
    The Proceedings are published as a collective work, © 2006 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.

  • If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
To become a USENIX Member, please see our Membership Information.

Last changed: 15 Sept. 2006 ch