Network Error Logging: Client-side measurement of end-to-end web service reliability

Authors: 

Sam Burnett and Lily Chen, Google; Douglas A. Creager, GitHub; Misha Efimov, Ilya Grigorik, and Ben Jones, Google; Harsha V. Madhyastha, Google and University of Michigan; Pavlos Papageorge, Brian Rogan, Charles Stahl, and Julia Tuttle, Google

Abstract: 

We present NEL (Network Error Logging), a planet-scale, client-side, network reliability measurement system. NEL is implemented in Google Chrome and has been proposed as a new W3C standard, letting any web site operator collect reports of clients’ successful and failed requests to their sites. These reports are similar to web server logs, but include information about failed requests that never reach serving infrastructure. Reports are uploaded via redundant failover paths, reducing the likelihood of shared-fate failures of report uploads. We have designed NEL such that service providers can glean no additional information about users or their behavior compared to what services already have visibility into during normal operation. Since 2014, NEL has been invaluable in monitoring all of Google’s domains, allowing us to detect and investigate instances of DNS hijacking, BGP route leaks, protocol deployment bugs, and other problems where packets might never reach our servers. This paper presents the design of NEL, case studies of real outages, and deployment lessons for other operators who choose to use NEL to monitor their traffic.

NSDI '20 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {246502,
author = {Sam Burnett and Lily Chen and Douglas A. Creager and Misha Efimov and Ilya Grigorik and Ben Jones and Harsha V. Madhyastha and Pavlos Papageorge and Brian Rogan and Charles Stahl and Julia Tuttle},
title = {Network Error Logging: Client-side measurement of end-to-end web service reliability },
booktitle = {17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20)},
year = {2020},
isbn = {978-1-939133-13-7},
address = {Santa Clara, CA},
pages = {985--998},
url = {https://www.usenix.org/conference/nsdi20/presentation/burnett},
publisher = {USENIX Association},
month = feb
}

Presentation Video