{ZipLLM}: Efficient {LLM} Storage via {Model-Aware} Synergistic Data Deduplication and Compression

Zirui Wang; Tingfeng Lan; Zhaoyuan Su; Juncheng Yang; Yue Cheng

Zirui Wang, Tingfeng Lan, and Zhaoyuan Su, University of Virginia; Juncheng Yang, Harvard University; Yue Cheng, University of Virginia

Modern model hubs, such as Hugging Face, store tens of petabytes of LLMs, with fine-tuned variants vastly outnumbering base models and dominating storage consumption. Existing storage reduction techniques—such as deduplication and compression—are either LLM-oblivious or not compatible with each other, limiting data reduction effectiveness.

Our large-scale characterization study across all publicly available Hugging Face LLM repositories reveals several key insights: (1) fine-tuned models within the same family exhibit highly structured, sparse parameter differences suitable for delta compression; (2) bitwise similarity enables LLM family clustering; and (3) tensor-level deduplication is better aligned with model storage workloads, achieving high data reduction with low metadata overhead. Building on these insights, we design BitX, an effective, fast, lossless delta compression algorithm that compresses the XORed difference between fine-tuned and base LLMs. We build ZipLLM, a model storage reduction pipeline that unifies tensor-level deduplication and lossless BitX compression. By synergizing deduplication and compression around LLM family clustering, ZipLLM reduces model storage consumption by 54%, over 20% higher than state-of-the-art deduplication and compression approaches.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {316110,
author = {Zirui Wang and Tingfeng Lan and Zhaoyuan Su and Juncheng Yang and Yue Cheng},
title = {{ZipLLM}: Efficient {LLM} Storage via {Model-Aware} Synergistic Data Deduplication and Compression},
booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)},
year = {2026},
isbn = {978-1-939133-54-0},
address = {Renton, WA},
pages = {2371--2387},
url = {https://www.usenix.org/conference/nsdi26/presentation/wang-zirui},
publisher = {USENIX Association},
month = may
}

Download

Wang PDF

Wang Paper (Prepublication) PDF

View the slides

ZipLLM: Efficient LLM Storage via Model-Aware Synergistic Data Deduplication and Compression

Open Access Media

Presentation Video