Chiheng Lou, Sheng Qi, and Chao Jin, School of Computer Science, Peking University; Dapeng Nie, Haoran Yang, and Yu Ding, Alibaba Group; Xuanzhe Liu and Xin Jin, School of Computer Science, Peking University
With the proliferation of large language model (LLM) variants, developers are turning to serverless computing for cost-efficient LLM deployment. However, public cloud providers often struggle to provide performance guarantees for serverless LLM serving due to significant cold start latency caused by substantial model sizes and complex runtime dependencies. To address this problem, we present HydraServe, a serverless LLM serving system designed to minimize cold start latency in public clouds. HydraServe proactively distributes models across servers to quickly fetch them, and overlaps cold-start stages within workers to reduce startup latency. Additionally, HydraServe strategically places workers across GPUs to avoid network contention among cold-start instances. To minimize resource consumption during cold starts, HydraServe further introduces pipeline consolidation that can merge groups of workers into individual serving endpoints. Our comprehensive evaluations under diverse settings demonstrate that HydraServe reduces the cold start latency by 1.7×–4.7× and improves service level objective attainment by 1.43×–1.74× compared to baselines.
NSDI '26 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Chiheng Lou and Sheng Qi and Chao Jin and Dapeng Nie and Haoran Yang and Yu Ding and Xuanzhe Liu and Xin Jin},
title = {{HydraServe}: Minimizing Cold Start Latency for Serverless {LLM} Serving in Public Clouds},
booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)},
year = {2026},
isbn = {978-1-939133-54-0},
address = {Renton, WA},
pages = {415--430},
url = {https://www.usenix.org/conference/nsdi26/presentation/lou},
publisher = {USENIX Association},
month = may
}


