Fork in the Road: Reflections and Optimizations for Cold Start Latency in Production Serverless Systems

Xiaohu Chai, Tsinghua University and Ant Group; Tianyu Zhou, Ant Group; Keyang Hu, Tsinghua University; Jianfeng Tan, Tiwei Bie, Anqi Shen, Dawei Shen, Qi Xing, Shun Song, Tongkai Yang, Le Gao, Feng Yu, and Zhengyu He, Ant Group; Dong Du and Yubin Xia, Shanghai Jiao Tong University; Kang Chen, Tsinghua University; Yu Chen, Quan Cheng Laboratory and Tsinghua University

Serverless computing has seen widespread adoption in public cloud environments. However, it continues to suffer from long cold start latency, which remains a key performance bottleneck. We have conducted an in-depth investigation of existing cold start optimizations and evaluated their effectiveness in large-scale industrial deployments. Our study reveals several common limitations in prior research: (1) reliance on simplified assumptions that overlook the complexities of large-scale systems; (2) a narrow focus on optimizing isolated components of the cold start process, while ignoring end-to-end workflow interactions; and (3) insufficient attention to the challenges introduced by concurrent execution environments. As a result, despite incorporating prior techniques, cold start latency on the Ant Group serverless platform remains in the range of hundreds of milliseconds to several seconds.

This paper identifies three previously overlooked sources of latency: (1) control path latency, stemming from interactions within the serverless runtime; (2) resource contention latency, arising under high concurrency and sustained execution; and (3) user code initialization latency, which reflects the trade-off between resource efficiency and startup performance. To address these challenges, we propose a suite of novel techniques that overcome key limitations in existing approaches. These techniques are designed to be both adaptable to real-world workloads and scalable to large deployments. Our system, AFaaS (short for Ant FaaS), reduces cold start latency to the millisecond level. AFaaS has been deployed in production for over 18 months and has consistently demonstrated stable performance at scale.

OSDI '25 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {308752,
author = {Xiaohu Chai and Tianyu Zhou and Keyang Hu and Jianfeng Tan and Tiwei Bie and Anqi Shen and Dawei Shen and Qi Xing and Shun Song and Tongkai Yang and Le Gao and Feng Yu and Zhengyu He and Dong Du and Yubin Xia and Kang Chen and Yu Chen},
title = {Fork in the Road: Reflections and Optimizations for Cold Start Latency in Production Serverless Systems},
booktitle = {19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25)},
year = {2025},
isbn = {978-1-939133-47-2},
address = {Boston, MA},
pages = {199--218},
url = {https://www.usenix.org/conference/osdi25/presentation/chai-xiaohu},
publisher = {USENIX Association},
month = jul
}

Presentation Video