Evolvable Network Telemetry at Facebook

Authors: 

Yang Zhou, Harvard University; Ying Zhang, Facebook; Minlan Yu, Harvard University; Guangyu Wang, Dexter Cao, Eric Sung, and Starsky Wong, Facebook

Abstract: 

Network telemetry is essential for service availability and performance in large-scale production environments. While there is recent advent in novel measurement primitives and algorithms for network telemetry, a challenge that is not well studied is Change. Facebook runs fast-evolving networks to adapt to varying application requirements. Changes occur not only in the data collection and processing stages but also when interpreted and consumed by applications. In this paper, we present PCAT, a production change-aware telemetry system that handles changes in fast-evolving networks. We propose to use a change cube abstraction to systematically track changes, and an intent-based layering design to confine and track changes. By sharing our experiences with PCAT, we bring a new aspect to the monitoring research area: improving the adaptivity and evolvability of network telemetry.

NSDI '22 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {276942,
author = {Yang Zhou and Ying Zhang and Minlan Yu and Guangyu Wang and Dexter Cao and Eric Sung and Starsky Wong},
title = {Evolvable Network Telemetry at Facebook},
booktitle = {19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)},
year = {2022},
isbn = {978-1-939133-27-4},
address = {Renton, WA},
pages = {961--975},
url = {https://www.usenix.org/conference/nsdi22/presentation/zhou},
publisher = {USENIX Association},
month = apr
}

Presentation Video