{CacheSlide}: Unlocking Cross {Position-Aware} {KV} Cache Reuse for Accelerating {LLM} Serving

Yang Liu; Yunfei Gu; Liqiang Zhang; Chentao Wu; Guangtao Xue; Jie Li; Minyi Guo; Junhao Hu; Jie Meng

Yang Liu and Yunfei Gu, Shanghai Jiao Tong University; Liqiang Zhang, Jinan Inspur Data Technology Co., Ltd; Chentao Wu, Guangtao Xue, Jie Li, and Minyi Guo, Shanghai Jiao Tong University; Junhao Hu, Peking University; Jie Meng, Huawei Cloud

Large Language Models (LLMs) are increasingly deployed in agent-based applications with complex prompt structures comprising both invariant and dynamic segments. Existing KV cache reuse strategies—PositionDependent Caching (PDC) and Position-Independent Caching (PIC)—inadequately address these scenarios, imposing either strict positional constraints or introducing significant computational overhead due to Positionally Misaligned KV Drift (PMKD) and window padding problems. We identify a distinct pattern in agent workflows termed Relative-Position-Dependent Caching (RPDC), where reusable segments maintain consistent relative ordering despite absolute position shifts. To address this pattern, we propose CacheSlide, a novel KV cache management system that enhances positional-encoding similarity for fixed segments, computes attention for only a minimal subset of tokens, combines new and cached KVs using learned weights, and implements layer-wise and spill-aware KV-cache optimizations. Our implementation extends vLLM’s KV cache management with Chunked Contextual Position Encoding and Weighted Correction Attention. Experimental evaluation across multiple LLMs and agent benchmarks demonstrates that CacheSlide significantly outperforms state-of-the-art baselines, achieving 3.11-4.3× reduction in latency and 3.5-5.8× improvement in throughput, establishing a new efficiency frontier for agent-based LLM applications.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {315967,
author = {Yang Liu and Yunfei Gu and Liqiang Zhang and Chentao Wu and Guangtao Xue and Jie Li and Minyi Guo and Junhao Hu and Jie Meng},
title = {{CacheSlide}: Unlocking Cross {Position-Aware} {KV} Cache Reuse for Accelerating {LLM} Serving},
booktitle = {24th USENIX Conference on File and Storage Technologies (FAST 26)},
year = {2026},
isbn = {978-1-939133-53-3},
address = {Santa Clara, CA},
pages = {83--99},
url = {https://www.usenix.org/conference/fast26/presentation/liu-yang},
publisher = {USENIX Association},
month = feb
}

Download

Liu PDF

View the slides

CacheSlide: Unlocking Cross Position-Aware KV Cache Reuse for Accelerating LLM Serving

Open Access Media

Presentation Video