Flattened Clos: Designing High-performance Deadlock-free Expander Data Center Networks Using Graph Contraction

Authors: 

Shizhen Zhao, Qizhou Zhang, Peirui Cao, Xiao Zhang, and Xinbing Wang, Shanghai Jiao Tong University; Chenghu Zhou, Shanghai Jiao Tong University and Chinese Academy of Sciences

Abstract: 

Clos networks have witnessed the successful deployment of RoCE in production data centers. However, as DCN bandwidth keeps increasing, building Clos networks is becoming cost-prohibitive and thus the more cost-efficient expander graph has received much attention in recent literature. Unfortunately, the existing expander graphs' topology and routing designs may contain Cyclic Buffer Dependency (CBD) and incur deadlocks in PFC-enabled RoCE networks.

We propose Flattened Clos (FC), a topology/routing codesigned approach, to eliminate the PFC-induced deadlocks in expander networks. FC's topology and routing are designed in three steps: 1) logically divide each ToR switch into k virtual layers and establish connections only between adjacent virtual layers; 2) generate virtual up-down paths for routing; 3) flatten the virtual multi-layered network and the virtual up-down paths using graph contraction. We rigorously prove that FC's design is deadlock-free and validate this property using a real testbed and packet-level simulation. Compared to expander graphs with the edge-disjoint-spanning-tree (EDST) based routing (a state-of-art CBD-free routing algorithm for expander graphs), FC reduces the average hop count by at least 50% and improves network throughput by 2−10× or more. Compared to Clos networks with up-down routing, FC increases network throughput by 1.1−2× under all-to-all and uniform random traffic patterns.

NSDI '23 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

BibTeX
@inproceedings {286482,
author = {Shizhen Zhao and Qizhou Zhang and Peirui Cao and Xiao Zhang and Xinbing Wang and Chenghu Zhou},
title = {Flattened Clos: Designing High-performance Deadlock-free Expander Data Center Networks Using Graph Contraction},
booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)},
year = {2023},
isbn = {978-1-939133-33-5},
address = {Boston, MA},
pages = {663--683},
url = {https://www.usenix.org/conference/nsdi23/presentation/zhao-shizhen},
publisher = {USENIX Association},
month = apr
}

Presentation Video