NetHint: White-Box Networking for Multi-Tenant Data Centers

Website Maintenance Alert

Due to scheduled maintenance, the USENIX website will not be available on Saturday, April 13, from 12:00 am–12:30 am Pacific Daylight Time (UTC-7). We apologize for the inconvenience.

If you are trying to register for NSDI '24 or register for PEPR '24, please complete your registration before or after this time period.

Authors: 

Jingrong Chen, Duke University; Hong Zhang, University of California, Berkeley; Wei Zhang, Duke University; Liang Luo, University of Washington; Jeffrey Chase, Duke University; Ion Stoica, University of California, Berkeley; Danyang Zhuo, Duke University

Abstract: 

A cloud provider today provides its network resources to its tenants as a black box, such that cloud tenants have little knowledge of the underlying network characteristics. Meanwhile, data-intensive applications have increasingly migrated to the cloud, and these applications have both the ability and the incentive to adapt their data transfer schedules based on the cloud network characteristics. We find that the black-box networking abstraction and the adaptiveness of data-intensive applications together create a mismatch, leading to sub-optimal application performance.

This paper explores a white-box approach to resolving this mismatch. We propose NetHint, an interactive mechanism between a cloud tenant and a cloud provider to jointly enhance application performance. With NetHint, the provider provides a hint — an indirect indication of the underlying network characteristics (e.g., link-layer network topologies for a tenant's virtual machines, number of co-locating tenants, network bandwidth utilization), and the tenant's applications then adapt their transfer schedules accordingly. The NetHint design provides abundant network information for cloud tenants to compute their optimal transfer schedules, while introducing little overhead for the cloud provider to collect and expose this information. Evaluation results show that NetHint improves the average performance of allreduce completion time, broadcast completion time, and MapReduce shuffle completion time by 2.7×, 1.5×, and 1.2×, respectively.

NSDI '22 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

BibTeX
@inproceedings {278382,
author = {Jingrong Chen and Hong Zhang and Wei Zhang and Liang Luo and Jeffrey Chase and Ion Stoica and Danyang Zhuo},
title = {{NetHint}: {White-Box} Networking for {Multi-Tenant} Data Centers},
booktitle = {19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)},
year = {2022},
isbn = {978-1-939133-27-4},
address = {Renton, WA},
pages = {1327--1343},
url = {https://www.usenix.org/conference/nsdi22/presentation/chen-jingrong},
publisher = {USENIX Association},
month = apr
}

Presentation Video