Yubo Liu, Hongbo Li, Xiaojia Huang, Yongfeng Wang, Hanjun Guo, Hui Chen, Yuxin Ren, and Ning Jia, Huawei Technologies Co., Ltd.
This paper examines the model loading bottleneck during the LLM inference startup. Existing solutions often optimize model loading performance at the expense of compatibility. However, compatibility is a crucial factor determining whether a technology can be widely applied in real-world scenarios. This work achieves both high performance and strong compatibility by optimizing the cache policy of the kernel file system. We design PPC, a programmable page cache framework that allows users to customize page cache policies in a non-intrusive, flexible, and lightweight manner. Furthermore, we design MAIO, a cache policy implemented based on PPC, to optimize model loading. MAIO introduces an I/O template-based mechanism to fully utilize SSD bandwidth, XPU affinity, and data locality to enhance the efficiency of prefetching and eviction. Our evaluation shows that MAIO reduces the model loading latency by up to 79% compared to existing optimizations. In a real-world application, MAIO achieves up to 36% improvement in inference throughput over other tested solutions in the elastic deployment scenario.
FAST '26 Open Access Sponsored by
NetApp
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

author = {Yubo Liu and Hongbo Li and Xiaojia Huang and Yongfeng Wang and Hanjun Guo and Hui Chen and Yuxin Ren and Ning Jia},
title = {Accelerating Model Loading in {LLM} Inference by Programmable Page Cache},
booktitle = {24th USENIX Conference on File and Storage Technologies (FAST 26)},
year = {2026},
isbn = {978-1-939133-53-3},
address = {Santa Clara, CA},
pages = {117--132},
url = {https://www.usenix.org/conference/fast26/presentation/liu-yubo},
publisher = {USENIX Association},
month = feb
}