NSDI '20 List of Accepted Papers

NSDI '20 offers authors the choice of two submission deadlines. Prepublication versions of the accepted papers from the spring submission deadline are available below. In February, the full Proceedings, as well as all of the final paper PDFs, will be posted.

Spring Accepted Papers

Frequency Configuration for Low-Power Wide-Area Networks in a Heartbeat

Akshay Gadre, Carnegie Mellon University; Revathy Narayanan, Indian Institute of Technology Madras; Anh Luong, Anthony Rowe, Bob Iannucci, and Swarun Kumar, Carnegie Mellon University

Available Media

Low-power Wide-Area Networks (LP-WANs) are seen as a leading candidate to network the Internet-of-Things at city-scale. Yet, the battery life and performance of LP-WAN devices varies greatly based on their operating frequency. In multipath-rich urban environments, received signal power varies rapidly with a low-power transmitter's frequency, impacting its transmission time, data rate and battery life. However, the low bandwidth of LP-WANs means that there are hundreds of operating frequencies to choose from. Among them, we show how choosing a select few of these frequencies(<3.55%) effectively triples the battery life when compared to the rest for LP-WAN devices.

This paper presents Chime, a system enabling LP-WAN base stations to identify an optimal frequency of operation after the client sends one packet at one frequency. Chime achieves this by analyzing the wireless channels of this packet across many base stations to disentangle multipath and ascertain an optimal frequency that maximizes client battery life and minimizes interference. We implement Chime on a campus-scale test-bed and achieve a median gain of 3.4 dB in SINR leading to a median increase in battery life of 230% (~1.4-5.7 years), data rate by 3.3X and reduction in interference of 2.8X over commodity LP-WANs.

Plankton: Scalable network configuration verification through model checking

Santhosh Prabhu, Kuan Yen Chou, Ali Kheradmand, Brighten Godfrey, and Matthew Caesar, University of Illinois at Urbana Champaign

Available Media

Network configuration verification enables operators to ensure that the network will behave as intended, prior to deployment of their configurations. Although techniques ranging from graph algorithms to SMT solvers have been proposed, scalable configuration verification with sufficient protocol support continues to be a challenge. In this paper, we show that by combining equivalence partitioning with explicit-state model checking, network configuration verification can be scaled significantly better than the state of the art, while still supporting a rich set of protocol features. We propose Plankton, which uses symbolic partitioning to manage large header spaces and efficient model checking to exhaustively explore protocol behavior. Thanks to a highly effective suite of optimizations including state hashing, partial order reduction, and policy-based pruning, Plankton successfully verifies policies in industrial-scale networks quickly and compactly, at times reaching a 10000x speedup compared to the state of the art.

Expanding across time to deliver bandwidth efficiency and low latency

William M. Mellette, Rajdeep Das, Yibo Guo, Rob McGuinness, Alex C. Snoeren, and George Porter, University of California San Diego

Available Media

Datacenters need networks that support both low-latency and high-bandwidth packet delivery to meet the stringent requirements of modern applications. We present Opera, a dynamic network that delivers latency-sensitive traffic quickly by relying on multi-hop forwarding in the same way as expander-graph-based approaches, but provides near-optimal bandwidth for bulk flows through direct forwarding over time-varying source-to-destination circuits. Unlike prior approaches, Opera requires no separate electrical network and no active circuit scheduling. The key to Opera's design is the rapid and deterministic reconfiguration of the network, piece-by-piece, such that at any moment in time the network implements an expander graph, yet, integrated across time, the network provides bandwidth-efficient single-hop paths between all racks. We show that Opera supports low-latency traffic with flow completion times comparable to cost-equivalent static topologies, while delivering up to 4$\times$ the bandwidth for all-to-all traffic and supporting up to 60\% higher load for published datacenter workloads.

Config2Spec: Mining Network Specifications from Network Configurations

Rüdiger Birkner, ETH Zürich; Dana Drachsler-Cohen, Technion; Laurent Vanbever and Martin Vechev, ETH Zürich

Available Media

Network verification and configuration synthesis are promising approaches to make networks more reliable and secure by enforcing a set of policies. However, these approaches require a formal and precise description of the intended network behavior, imposing a major barrier to their adoption: network operators are not only reluctant to write formal specifications, but often do not even know what these specifications are.

We present Config2Spec, a system that automatically synthesizes a formal specification (a set of policies) of a network given its configuration and a failure model (e.g., up to two link failures). A key technical challenge is to design a synthesis algorithm which can efficiently explore the large space of possible policies. To address this challenge, Config2Spec relies on a careful combination of two well-known methods: data plane analysis and control plane verification.

Experimental results show that Config2Spec scales to mining specifications of large networks (>150 routers).

XRD: Scalable Messaging System with Cryptographic Privacy

Albert Kwon, MIT; David Lu, MIT PRIMES; Srinivas Devadas, MIT

Available Media

Even as end-to-end encrypted communication becomes more popular, private messaging remains a challenging problem due to metadata leakages, such as who is communicating with whom. Most existing systems that hide communication metadata either (1) do not scale easily, (2) incur significant overheads, or (3) provide weaker guarantees than cryptographic privacy, such as differential privacy or heuristic privacy. This paper presents XRD (short for Crossroads), a metadata private messaging system that provides cryptographic privacy, while scaling easily to support more users by adding more servers. At a high level, XRD uses multiple mix networks in parallel with several techniques, including a novel technique we call aggregate hybrid shuffle. As a result, XRD can support 2 million users with 228 seconds of latency with 100 servers. This is 13.3× and 4.1× faster than Atom and Pung, respectively, which are prior scalable messaging systems with cryptographic privacy.

Re-architecting Congestion Management in Lossless Ethernet

Wenxue Cheng and Kun Qian, Tsinghua University; Wanchun Jiang, Central South University; Tong Zhang and Fengyuan Ren, Tsinghua University

Available Media

The lossless Ethernet is attractive for data centers and cluster systems, but various performance issues, such as unfairness, head-of-line blocking and congestion spreading, etc., impede its large-scale deployment in production systems. Through fine-grained experimental observations, we inspect the interactions between flow control and congestion control, and are aware that the radical cause of performance problems is the ineffective elements in the congestion management architecture for lossless Ethernet, including the improper congestion detection mechanism and inadequate rate adjustment law.

Inspired by these insights and findings obtained in experiment investigations, we revise the congestion management architecture, and propose the Photonic Congestion Notification (PCN) scheme, which consists of two basic components: (i) a novel congestion detection and identification mechanism to recognize which flows are really responsible for congestion; (ii) a receiver-driven rate adjustment method to alleviate congestion in as short as 1 RTT. We implement PCN using DPDK NICs and conduct evaluations using testbed experiments and simulations. The results show that PCN greatly improves performance under concurrent burst workload, and significantly mitigates PFC PAUSE messages and reduces the flow completion time under realistic workload.

Measuring Congestion in High-Performance Datacenter Interconnects

Saurabh Jha and Archit Patke, University of Illinois at Urbana-Champaign; Jim Brandt and Ann Gentile, Sandia National Lab; Benjamin Lim, University of Illinois at Urbana-Champaign; Mike Showerman and Greg Bauer, National Center for Supercomputing Applications; Larry Kaplan, Cray Inc.; Zbigniew Kalbarczyk, University of Illinois at Urbana-Champaign; William Kramer, University of Illinois at Urbana-Champaign and National Center for Supercomputing Applications; R. Iyer, University of Illinois at Urbana-Champaign

Available Media

While it is widely acknowledged that network congestion in High Performance Computing (HPC) systems can significantly degrade application performance, there has been little to no quantification of congestion on credit-based interconnect networks. We present a methodology for detecting, extracting, and characterizing regions of congestion in networks. We have implemented the methodology in a deployable tool, Monet, which can provide such analysis and feedback at runtime. Using Monet, we characterize and diagnose congestion in the world's largest 3D torus network of Blue Waters, a 13.3-petaflop supercomputer at the National Center for Supercomputing Applications. Our study deepens the understanding of production congestion at a scale that has never been evaluated before.

AccelTCP: Accelerating Network Applications with Stateful TCP Offloading

YoungGyoun Moon and SeungEon Lee, KAIST; Muhammad Asim Jamshed, Intel Labs; KyoungSoo Park, KAIST

Available Media

The performance of modern key-value servers or layer-7 load balancers often heavily depends on the efficiency of the underlying TCP stack. Despite numerous optimizations such as kernel-bypassing and zero-copying, performance improvement with a TCP stack is fundamentally limited due to the protocol conformance overhead for compatible TCP operations. Unfortunately, the protocol conformance overhead amounts to as large as 60% of the entire CPU cycles for short-lived connections or degrades the performance of L7 proxying by 3.2x to 6.3x.

This work presents AccelTCP, a hardware-assisted TCP stack architecture that harnesses programmable network interface cards (NICs) as a TCP protocol accelerator. AccelTCP can offload complex TCP operations such as connection setup and teardown completely to NIC, which simplifies the host stack operations and frees a significant amount of CPU cycles for application processing. In addition, it supports running connection splicing on NIC so that the NIC relays all packets of the spliced connections with zero DMA overhead. Our evaluation shows that AccelTCP enables short-lived connections to perform comparably to persistent connections. It also improves the performance of Redis, a popular in-memory key-value store, and HAProxy, a widely-used layer-7 load balancer, by 2.3x and 11.9x, respectively.

Enabling Programmable Transport Protocols in High-Speed NICs

Mina Tahmasbi Arashloo and Alexey Lavrov, Princeton University; Manya Ghobadi, MIT; Jennifer Rexford, David Walker, and David Wentzlaff, Princeton University

Available Media

Data-center network stacks are moving into hardware to achieve 100 Gbps data rates and beyond at low latency and low CPU utilization. However, hardwiring the network stack in the NIC would stifle innovation in transport protocols. In this paper, we enable programmable transport protocols in high-speed NICs by designing Tonic, a flexible hardware architecture for transport logic. At 100 Gbps, transport protocols must generate a data segment every few nanoseconds using only a few kilobits of per-flow state on the NIC. By identifying common patterns across transport logic of different transport protocols, we design an efficient hardware "template" for transport logic that satisfies these constraints while being programmable with a simple API. Experiments with our FPGA-based prototype show that Tonic can support the transport logic of a wide range of protocols and meet timing for 100 Gbps of back-to-back 128-byte packets. That is, every 10 ns, our prototype generates the address of a data segment for one of more than a thousand active flows for a downstream DMA pipeline to fetch and transmit a packet.

Contra: A Programmable System for Performance-aware Routing

Kuo-Feng Hsu, Rice University; Ryan Beckett, Microsoft Research; Ang Chen, Rice University; Jennifer Rexford, Praveen Tammana, and David Walker, Princeton University

Available Media

We present Contra, a system for performance-aware routing that can adapt to traffic changes at hardware speeds. While point solutions exist for a fixed topology (e.g., a Fattree) with a fixed routing policy (e.g., use least utilized paths), Contra can operate seamlessly over any network topology and a wide variety of sophisticated routing policies. Users of Contra write network-wide policies that rank network paths given their current performance. A compiler then analyzes such policies in conjunction with the network topology and decomposes them into switch-local P4 programs, which collectively implement a new, specialized distance-vector protocol. This protocol generates compact probes that traverse the network, gathering path metrics to optimize for the user policy dynamically. Switches respond to changing network conditions by routing flowlets along the best policy-compliant paths. Our experiments show that Contra scales to large networks, and that in terms of flow completion times, it is competitive with hand-crafted systems that have been customized for specific topologies and policies.

Network Error Logging: Client-side measurement of end-to-end web service reliability

Sam Burnett and Lily Chen, Google; Douglas A. Creager, GitHub; Misha Efimov, Ilya Grigorik, and Ben Jones, Google; Harsha V. Madhyastha, Google and University of Michigan; Pavlos Papageorge, Brian Rogan, Charles Stahl, and Julia Tuttle, Google

Available Media

We present NEL (Network Error Logging), a planet-scale, client-side, network reliability measurement system. NEL is implemented in Google Chrome and has been proposed as a new W3C standard, letting any web site operator collect reports of clients’ successful and failed requests to their sites. These reports are similar to web server logs, but include information about failed requests that never reach serving infrastructure. Reports are uploaded via redundant failover paths, reducing the likelihood of shared-fate failures of report uploads. We have designed NEL such that service providers can glean no additional information about users or their behavior compared to what services already have visibility into during normal operation. Since 2014, NEL has been invaluable in monitoring all of Google’s domains, allowing us to detect and investigate instances of DNS hijacking, BGP route leaks, protocol deployment bugs, and other problems where packets might never reach our servers. This paper presents the design of NEL, case studies of real outages, and deployment lessons for other operators who choose to use NEL to monitor their traffic.

Gandalf: An Intelligent, End-To-End Analytics Service for Safe Deployment in Large-Scale Cloud Infrastructure

Ze Li, Qian Cheng, Ken Hsieh, and Yingnong Dang, Microsoft Azure; Peng Huang, Johns Hopkins University; Pankaj Singh and Xinsheng Yang, Microsoft Azure; Qingwei Lin, Microsoft Research; Youjiang Wu, Sebastien Levy, and Murali Chintalapati, Microsoft Azure

Available Media

Modern cloud systems have a vast number of components that continuously undergo updates. Deploying these frequent updates quickly without breaking the system is challenging. In this paper, we present Gandalf, an end-to-end analytics service for safe deployment in a large-scale system infrastructure. Gandalf enables rapid and robust impact assessment of software rollouts to catch bad rollouts before they cause widespread outages. Gandalf monitors and analyzes various fault signals. It will correlate each signal against all the ongoing rollouts using a spatial and temporal correlation algorithm. The core decision logic of Gandalf includes an ensemble ranking algorithm that determines which rollout may have caused the fault signals, and a binary classifier that assesses the impact of the fault signals. The analysis result will decide whether a rollout is safe to proceed or should be stopped.

By using a lambda architecture, Gandalf provides both real-time and long-term deployment monitoring with automated decisions and notifications. Gandalf has been running in production in Microsoft Azure for more than 18 months, serving both data-plane and control-plane components. It achieves 92.4% precision and 100% recall (no high-impact service outages in Azure Compute were caused by bad rollouts) for data-plane rollouts. For control-plane rollouts, Gandalf achieves 94.9% precision and 99.8% recall.

Fine-Grained Replicated State Machines for a Cluster Storage System

Ming Liu and Arvind Krishnamurthy, University of Washington; Harsha V. Madhyastha, University of Michigan; Rishi Bhardwaj, Karan Gupta, Chinmay Kamat, Huapeng Yuan, Aditya Jaltade, Roger Liao, Pavan Konka, and Anoop Jawahar, Nutanix

Available Media

We describe the design and implementation of a consistent and fault-tolerant metadata index for a scalable block storage system. The block storage system supports virtualized execution of legacy applications inside enterprise clusters by automatically distributing the stored blocks across the cluster’s storage resources. To support the availability and scalability needs of the block storage system, we develop a distributed index that provides a replicated and consistent key-value storage abstraction. The key idea underlying our design is the use of fine-grained replicated state machines, wherein every key-value pair in the index is treated as a separate replicated state machine. This approach has many advantages over a traditional coarse-grained approach that represents an entire shard of data as a state machine: it enables effective use of multiple storage devices and cores, it is more robust to both short- and long-term skews in key access rates, and it can tolerate variations in key-value access latencies. The use of fine-grained replicated state machines, however, raises new challenges, which we address by co-designing the consensus protocol with the data store and streamlining the operation of the per-key replicated state machines. We demonstrate that fine-grained replicated state machines can provide significant performance benefits, characterize the performance of the system in the wild, and report on our experiences in building and deploying the system.

Meaningful Availability

Tamás Hauer, Philipp Hoffmann, John Lunney, Dan Ardelean, and Amer Diwan, Google

Available Media

High availability is a critical requirement for cloud applications: if a sytem does not have high availability, users cannot count on it for their critical work. Having a metric that meaningfully captures availability is useful for both users and system developers. It informs users what they should expect of the availability of the application. It informs developers what they should focus on to improve user-experienced availability. This paper presents and evaluates, in the context of Google's G~Suite, a novel availability metric: \windowed. This metric has two main components. First, it directly models user-perceived availability and avoids the bias in commonly used availability metrics. Second, by simultaneously calculating the availability metric over many windows it can readily distinguish between many short periods of unavailability and fewer but longer periods of unavailability.

SP-PIFO: Approximating Push-In First-Out Behaviors using Strict-Priority Queues

Albert Gran Alcoz, Alexander Dietmüller, and Laurent Vanbever, ETH Zürich

Available Media

Push-In First-Out (PIFO) queues are hardware primitives which enable programmable packet scheduling by allowing to perfectly reorder packets at line rate. While promising, implementing PIFO queues in hardware and at scale is not easy: only hardware designs (not implementations) exist and they can only support about 1000 flows.

In this paper, we introduce SP-PIFO, a programmable packet scheduler which closely approximates the behavior of PIFO queues using strict-priority queues---\emph{at line rate, at scale, and on existing devices}. The key insight behind SP-PIFO is to dynamically adapt the mapping between packet ranks and available queues to minimize the scheduling errors. We present a mathematical formulation of the problem and derive an adaptation technique which closely approximates the optimal queue mapping without any traffic knowledge.

We fully implement SP-PIFO in P4 and evaluate it on real workloads. We show that SP-PIFO: (i) closely matches ideal PIFO performance, with as little as 8 priority queues; (ii) arbitrarily scales to large amount of flows and ranks; and (iii) quickly adapts to traffic variations. We also show that SP-PIFO runs at line rate on existing programmable data planes.

NetSMC: A Custom Symbolic Model Checker for Stateful Network Verification

Yifei Yuan, Intentionet; Soo-Jin Moon, Sahil Uppal, Limin Jia, and Vyas Sekar, Carnegie Mellon University

Available Media

Modern networks enforce rich and dynamic policies (e.g., dynamic service chaining and path pinning) over a number of complex and stateful NFs (e.g., stateful firewall and load balancer). Verifying if those policies are correctly implemented is important to ensure the network’s availability, safety, and security. Unfortunately, theoretical results suggest that verifying even simple policies (e.g., A cannot talk to B) in stateful networks is undecidable. Consequently, any approach for stateful network verification has to fundamentally make some relaxations; e.g., either on policies supported, or the network behaviors it can capture, or in terms of the soundness/completeness guarantees. In this paper, we identify practical opportunities for relaxations in order to develop an efficient verification tool. First, we identify key domain-specific insights to develop a more compact network semantic model which is equivalent to a general semantic model for checking a wide range of policies under practical conditions. Second, we identify a restrictive-yet-expressive policy language to support a wide range of policies including dynamic service chaining and path pinning while enable efficient verification. Third, we develop customized symbolic model checking algorithms as our model and policy specification allows us to succinctly encode network states using existential first-order logic, which enables efficient checking algorithms. We prove the correctness of our approach for a subset of policies and show that our tool, NetSMC, achieves orders of magnitude speedup compared to existing approaches.

TinySDR: Low-Power SDR Platform for Over-the-Air Programmable IoT Testbeds

Mehrdad Hessar, Ali Najafi, Vikram Iyer, and Shyamnath Gollakota, University of Washington

Available Media

Wireless protocol design for IoT networks is an active area of research which has seen significant interest and developments in recent years. The research community is however handicapped by the lack of a flexible, easily deployable platform for prototyping IoT endpoints that would allow for ground up protocol development and investigation of how such protocols perform at scale. We introduce tinySDR, the first software-defined radio platform tailored to the needs of power-constrained IoT endpoints. TinySDR provides a standalone, fully programmable low power software-defined radio solution that can be duty cycled for battery operation like a real IoT endpoint, and more importantly, can be programmed over the air to allow for large scale deployment. We present extensive evaluation of our platform showing it consumes as little as 30 uW of power in sleep mode, which is 10,000x lower than existing SDR platforms. We present two case studies by implementing LoRa and BLE beacons on the platform and achieve sensitivities of -126 dBm and -94 dBm respectively while consuming 11% and 3% of the FPGA resources. Finally, using tinySDR, we explore the research question of whether an IoT device can demodulate concurrent LoRa transmissions in real-time, within its power and computing constraints.

Fall Accepted Papers

Is Big Data Performance Reproducible in Modern Cloud Networks?

Alexandru Uta and Alexandru Custura, Vrije Universiteit Amsterdam; Dmitry Duplyakin, University of Utah; Ivo Jimenez, UC Santa Cruz; Jan Rellermeyer, TU Delft; Carlos Maltzahn, UC Santa Cruz; Robert Ricci, University of Utah; Alexandru Iosup, Vrije Universiteit Amsterdam

Telekine: Secure Computing with Cloud GPUs

Tyler Hunt, Zhipeng Jia, Vance Miller, Ariel Szekely, and Yige Hu, The University of Texas at Austin; Christopher J. Rossbach, The University of Texas at Austin and VMware Research; Emmett Witchel, The University of Texas at Austin

Adapting TCP for Reconfigurable Datacenter Networks

Matthew Mukerjee, Carnegie Mellon University / Nefeli Networks; Christopher Canel, Carnegie Mellon University; Weiyang Wang, UC San Diego; Daehyeok Kim, Carnegie Mellon University / Microsoft Research; Srinivasan Seshan, Carnegie Mellon University; Alex C. Snoeren, UC San Diego

Liveness Verification of Stateful Networks

Farnaz Yousefi, Johns Hopkins University; Anubhavnidhi Abhashkumar and Kausik Subramanian, University of Wisconsin-Madison; Kartik Hans, IIT Delhi; Soudeh Ghorbani, Johns Hopkins University; Aditya Akella, Aws Albarghouthi, and Loris D'Antoni, University of Wisconsin-Madison

Programmable Calendar Queues for Packet Scheduling

Naveen Kr. Sharma, Chenxingyu Zhao, and Ming Liu, University of Washington; Pravein Govindan, School of Computing, National University of Singapore; Changhoon Kim, Barefoot Networks; Arvind Krishnamurthy, University of Washington; Anirudh Sivaraman, NYU