{DroidSpeak}: {KV} Cache Sharing Across Fine-tuned Model Variants

Yuhan Liu; Yuyang Huang; Jiayi Yao; Shaoting Feng; Zhuohan Gu; Kuntai Du; Hanchen Li; Yihua Cheng; Junchen Jiang; Shan Lu; Madan Musuvathi; Esha Choukse

Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, and Junchen Jiang, University of Chicago; Shan Lu, Madan Musuvathi, and Esha Choukse, Microsoft

Compound AI systems, such as agentic systems, are an emerging trend in large-scale enterprise settings, with multiple LLMs specialized for different users, tasks, and/or roles working together. In these scenarios, different models often process inputs that share the same context prefix. Although much work was done in the past to enable the reuse of prefix KV caches across inputs for a single model, how to enable one model to reuse the prefix KV caches of a different model remains an open question.

We introduce DroidSpeak, the first distributed LLM inference system that enables KV cache reuse across distributed nodes running inference of different LLMs, so long as the LLMs have the same architecture. We present the first study that aims at understanding the impact of sharing KV caches across different LLMs, and if/when such sharing affects quality. Inspired by the findings, we present DroidSpeak, which selectively recomputes a few layers of the KV cache produced by another LLM and reuses the remaining layers, with negligible quality loss. Moreover, carefully pipelining the layer-wise re-computation and the loading of reused KV cache further improves the inference performance. Experiments on diverse datasets and model pairs demonstrate that DroidSpeak achieves up to 4x throughput improvement and about 3.1× faster prefill (time to first token), with negligible loss of quality in F1 scores, Rouge-L or code similarity score, compared to the baseline which does not allow any sharing across models.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {316100,
author = {Yuhan Liu and Yuyang Huang and Jiayi Yao and Shaoting Feng and Zhuohan Gu and Kuntai Du and Hanchen Li and Yihua Cheng and Junchen Jiang and Shan Lu and Madan Musuvathi and Esha Choukse},
title = {{DroidSpeak}: {KV} Cache Sharing Across Fine-tuned Model Variants},
booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)},
year = {2026},
isbn = {978-1-939133-54-0},
address = {Renton, WA},
pages = {319--338},
url = {https://www.usenix.org/conference/nsdi26/presentation/liu-yuhan},
publisher = {USENIX Association},
month = may
}

Download

Liu PDF

Liu Paper (Prepublication) PDF

DroidSpeak: KV Cache Sharing Across Fine-tuned Model Variants

Open Access Media

Presentation Video