007: Democratically Finding the Cause of Packet Drops

Authors: 

Behnaz Arzani, Microsoft Research; Selim Ciraci, Microsoft; Luiz Chamon, University of Pennsylvania; Yibo Zhu and Hongqiang (Harry) Liu, Microsoft Research; Jitu Padhye, Microsoft; Boon Thau Loo, University of Pennsylvania; Geoff Outhred, Microsoft

Abstract: 

Network failures continue to plague datacenter operators as their symptoms may not have direct correlation with where or why they occur. We introduce 007, a lightweight, always-on diagnosis application that can find problematic links and also pinpoint problems for each TCP connection. 007 is completely contained within the end host. During its two month deployment in a tier-1 datacenter, it detected every problem found by previously deployed monitoring tools while also finding the sources of other problems previously undetected.

NSDI '18 Open Access Videos Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {211229,
author = {Behnaz Arzani and Selim Ciraci and Luiz Chamon and Yibo Zhu and Hongqiang (Harry) Liu and Jitu Padhye and Boon Thau Loo and Geoff Outhred},
title = {007: Democratically Finding the Cause of Packet Drops},
booktitle = {15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18)},
year = {2018},
isbn = {978-1-939133-01-4},
address = {Renton, WA},
pages = {419--435},
url = {https://www.usenix.org/conference/nsdi18/presentation/arzani},
publisher = {{USENIX} Association},
month = apr,
}

Presentation Video