Congcong Miao and Minggang Chen, Tencent; Arpit Gupta, UC Santa Barbara; Zili Meng, Lianjin Ye, and Jingyu Xiao, Tsinghua University; Jie Chen, Zekun He, and Xulong Luo, Tencent; Jilong Wang, Tsinghua University, BNRist, and Peng Cheng Laboratory; Heng Yu, Tsinghua University
Degradation or failure events in optical backbone networks affect the service level agreements for cloud services. It is critical to detect and troubleshoot these events promptly to minimize their impact. Existing telemetry systems rely on arcane tools (e.g., SNMP) and vendor-specific controllers to collect optical data, which affects both the flexibility and scale of these systems. As a result, they fail to collect the required data on time to detect and troubleshoot degradation or failure events in a timely fashion. This paper presents the design and implementation of OpTel, an optical telemetry system, that uses a centralized vendor-agnostic controller to collect optical data in a streaming fashion. More specifically, it offers flexible vendor-agnostic interfaces between the optical devices and the controller and offloads data-management tasks (e.g., creating a queryable database) from the devices to the controller. As a result, OpTel enables the collection of fine-grained optical telemetry data at the one-second granularity. It has been running in Tencent's optical backbone network for the past six months. The fine-grained data collection enables the detection of short-lived events (i.e., ephemeral events). Compared to existing telemetry systems, OpTel accurately detects 2x more optical events. It also enables troubleshooting of these optical events in a few seconds, which is orders of magnitude faster than the state-of-the-art.
NSDI '22 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Congcong Miao and Minggang Chen and Arpit Gupta and Zili Meng and Lianjin Ye and Jingyu Xiao and Jie Chen and Zekun He and Xulong Luo and Jilong Wang and Heng Yu},
title = {Detecting Ephemeral Optical Events with {OpTel}},
booktitle = {19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)},
year = {2022},
isbn = {978-1-939133-27-4},
address = {Renton, WA},
pages = {339--353},
url = {https://www.usenix.org/conference/nsdi22/presentation/miao},
publisher = {USENIX Association},
month = apr
}