Latency-Aware Caching with Delayed Hits: From Bursty Traffic to Pipeline Architectures

Nadav Keren, Gil Einziger, and Gabriel Scalosub, Ben Gurion University of The Negev

Modern computing systems rely on caching to reduce access latency and optimize resource utilization. However, in heterogeneous storage and cloud environments, non-uniform access latencies across storage tiers, network locations, and intermediary caches undermine traditional caching. Moreover, modern cache algorithms that attempt to capture multiple access patterns, recency, frequency, and burstiness, often become complex and difficult to maintain.

As a key contribution, we propose an adaptive caching architecture that treats caching strategies as a pipeline of simple, orthogonal policies, each focused on a distinct access bias. This modular design is easier to expand, debug, and integrate, and it self-adjusts the memory resources allocated to each stage to optimize overall workload performance. New heuristics can be introduced dynamically without disrupting existing behaviors.

In addition, in latency-aware caching, one often encounters the phenomenon of delayed hits, where items not yet available in the cache are requested repeatedly. We introduce the Least Bursty Used (LBU) heuristic, which retains items exhibiting high burstiness even when they are neither recent nor frequent, thereby mitigating delayed hits that degrade request latency. We embed LBU within our pipeline and derive the Recency–Frequency–Burstiness (RFB) policy, which balances resources among recency, frequency, and burstiness. Evaluations on thirteen real-world storage traces from IBM, Twitter and Meta using latencies drawn from real-life deployments show that RFB reduces average request latency by 10% compared to the best state-of-the-art alternative, while maintaining consistent performance, with a low standard deviation across bursty and non-bursty workloads.

NSDI '26 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {316646,
author = {Nadav Keren and Gil Einziger and Gabriel Scalosub},
title = {{Latency-Aware} Caching with Delayed Hits: From Bursty Traffic to Pipeline Architectures},
booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)},
year = {2026},
isbn = {978-1-939133-54-0},
address = {Renton, WA},
pages = {2389--2405},
url = {https://www.usenix.org/conference/nsdi26/presentation/keren},
publisher = {USENIX Association},
month = may
}

Presentation Video