Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Program
  • Participate
    • Call for Participation
  • About
  • Home
  • Program
  • Participate
  • About

sponsors

Gold Sponsor
Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner

general information

Early Bird Registration Deadline: March 16, 2016

SREcon16 is SOLD OUT.
No walkup registrations will be accepted.

Venue:
Hyatt Regency Santa Clara
5101 Great America Pkwy
Santa Clara, CA 95054

Rooms at the Hyatt Regency Santa Clara are sold out.

Rooms available at:
Biltmore Hotel & Suites
2151 Laurelwood Road
Santa Clara, CA 95054

Book your room for $225 single or double plus tax or call (800) 255-9925 or (408) 988-8411 and reference USENIX Association or Billing ID #32992. Room rate includes WiFi and complimentary shuttle to the Hyatt Regency Santa Clara.

Questions?
About SREcon?
About the Call for Participation?
About the Hotel/Registration?
About Sponsorship?

help promote

SREcon16 button

twitter

Tweets by @SREcon

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Debugging Distributed Systems
Tweet

connect with us

Debugging Distributed Systems

Donny Nadolny, PagerDuty

Abstract: 

Despite our best efforts, our systems fail. Sometimes it’s our fault - code that we wrote or bugs that we caused. But sometimes the fault is with systems that we rely on.

ZooKeeper is a very useful distributed system that is often used as a building block for other distributed systems, like Kafka and Spark. It is used by PagerDuty for many critical systems, and for five months it failed on us a lot.

We will walk through the process of finding and fixing one cause of many of these failures. You will learn how to use various tools to stress test the network, some intricate details of how ZooKeeper works, and possibly more than you wanted to know about TCP including an example of machines having a different view of the state of a TCP stream.

Donny Nadolny is a Scala developer at PagerDuty, working on improving the reliability of their backend systems. He spends a large amount of time investigating problems experienced with distributed systems like Cassandra and ZooKeeper.

Donny Nadolny, PagerDuty

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {208606,
author = {Donny Nadolny},
title = {Debugging Distributed Systems},
year = {2016},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = apr,
}
Download
View the slides

Presentation Video 

Presentation Audio

MP3 Download

Download Audio

  • Log in or    Register to post comments

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

© USENIX

SREcon is a registered trademark of the USENIX Association.

  • Privacy Policy
  • Contact Us