Network Monitor: A Tale of ACKnowledging an Observability Gap

Wednesday, 2 October, 2019 - 15:0015:30

Jason Gedge, Shopify

Abstract: 

In the Fall of 2018 we spent nearly 6 weeks debugging Redis connection issues from our core app, pulling in many engineers along the way. The smoking gun to get our cloud provider involved was a high number of TCP retransmits. After bringing this evidence to them, their network engineers were able to fix the issue.

This incident showed us that we had an observability gap, due to lack of access and monitoring in our cloud environment. To this end, we built network monitor, a daemon running on all of our nodes to collect relevant network data. This daemon has evolved into a generic eBPF (extended Berkeley Packet Filter) orchestrator. In this talk, you'll learn about what we've built, and should walk away understanding why monitoring your network is a valuable endeavour, as well as how your teams can use eBPF to improve your observability stack.

Jason Gedge, Shopify

Jason is a Staff Production Engineer on the service communication team at Shopify. In the past, he spearheaded the first iteration of Shopify’s self-serve cloud platform and is now rolling out their first cloud service communication mesh. On the side, he is keeping busy in the #crazy-cat-people Slack channel and is working on becoming a next level eBPF wizard. Before Shopify, he was responsible for developer productivity at YouTube's San Bruno office.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {239462,
author = {Jason Gedge},
title = {Network Monitor: A Tale of {ACKnowledging} an Observability Gap},
year = {2019},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}

Presentation Video