Measuring End-User Availability via the Network Error Logging W3C API

Due to the evolving Coronavirus/COVID-19 situation, SREcon20 Americas West has been rescheduled to June 2–4, 2020.
More information is available here.

Thursday, March 26, 2020 - 12:05 pm12:40 pm

Mohit Suley, Microsoft

Abstract: 

Users of online services around the world experience issues with DNS resolution, TCP connections or SSL certificates on a regular basis. If you believe all your customers reached your online service, think again.

We will explain how an SRE team can leverage the new W3C API to get an availability telemetry feed that is automatically available from all (Chromium-based) endpoints. We will show what it takes to set up a pipeline to get this running and also walk the audience through actual examples we caught from real-user traffic that show the potential of this amazing telemetry system.

Our goal in this talk is to increase awareness in the SRE community for this new API and enable them to detect client issues reliably. If you were to ask us, this is the best thing since sliced bread.

Mohit Suley, Microsoft

Mohit is an engineer on Bing's Live Site Engineering team. Designing systems to proactively improve availability and make customers happy is a core mission for them. In his spare time, he loves to go for long walks, tinkers with hardware, and chases his unachievable goal of reading more books than Bill Gates.

BibTeX
@conference {247314,
author = {Mohit Suley},
title = {Measuring End-User Availability via the Network Error Logging W3C {API}},
year = {2020},
address = {Santa Clara, CA},
publisher = {{USENIX} Association},
month = mar,
}