Come Hell or Still Water: Alleviating Tail Latency in Cloud Block Store

Chaolei Hu, Tsinghua University and Alibaba Cloud; Kun Qian, Erci Xu, Yifan Shen, Haoran Zhang, Xue Li, Yuesheng Gu, and Lingjun Zhu, Alibaba Cloud; Fengyuan Ren, Tsinghua University; Ennan Zhai, Alibaba Cloud

Maintaining low tail latency is crucial for cloud storage services. In ALIBABA CLOUD, our Elastic Block Storage (EBS), like many others, adopts layers of load balancing to avoid hot-spot I/Os, a dominant contributor to tail latency.

However, in the field, EBS has still been suffering from tail latency spikes. Through extensive analysis of production workloads, we have identified the root cause: the workload bursts caused by a small group of Virtual Disks (VDs), which fundamentally influence the tail latency of the entire cluster. We hence propose a lightweight dual-bucket throttling mechanism to effectively mitigate the issue while maintaining fairness. In addition, we discover that, even under underloaded scenarios, the tail latency remains suboptimal due to the event-loop thread model. We propose a priority-based scheduling mechanism to separate I/O-related tasks from I/O-unrelated ones. Our evaluation shows that the proposed mechanisms can reduce the tail latency by up to 97% in burst and 43% in underloaded scenarios. Our mechanisms have been deployed across dozens of clusters for more than three months, and have served hundreds of trillions of I/O requests. They reduce the P99999 tail latency of steady segments by 59.7% under burst scenarios and of all I/Os by 22% in underloaded scenarios.

NSDI '26 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {316632,
author = {Chaolei Hu and Kun Qian and Erci Xu and Yifan Shen and Haoran Zhang and Xue Li and Yuesheng Gu and Lingjun Zhu and Fengyuan Ren and Ennan Zhai},
title = {Come Hell or Still Water: Alleviating Tail Latency in Cloud Block Store},
booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)},
year = {2026},
isbn = {978-1-939133-54-0},
address = {Renton, WA},
pages = {1259--1274},
url = {https://www.usenix.org/conference/nsdi26/presentation/hu-chaolei},
publisher = {USENIX Association},
month = may
}

Presentation Video