Skybridge: Bounded Staleness for Distributed Caches

Robert Lyerly, Meta Platforms Inc.; Scott Pruett, unaffiliated; Kevin Doherty and Greg Rogers, Meta Platforms Inc.; Nathan Bronson, OpenAI; John Hugg, Meta Platforms Inc.

Meta Platforms Inc. is a social media company whose products require high availability and low latency. Meta’s services run in multiple geographic locations around the world and use asynchronous replication to keep the numerous cached copies of the datastore in sync. This setup reduces consistency in order to meet availability and latency requirements. Eventual consistency due to asynchronous replication causes issues for Meta’s services, ranging from minor annoyances to product-breaking bugs. Therefore, we ask: can we put meaningful bounds on how long it takes writes to be visible while maintaining the scalability afforded by eventual consistency?

In this work we present Skybridge, an out-of-band replication stream for providing bounded staleness for distributed caches. Skybridge takes advantage of the fact that Meta’s systems already have a reliable delivery stream and instead focuses on real-time delivery of updates. Skybridge is complementary to the main replication pipeline and avoids correlated failures while being lightweight. We show that Skybridge helps provide 2-second bounded staleness for 99.99998% of writes, while the main replication pipeline only achieves this 99.993% of the time. Skybridge is able to achieve this while only being 0.54% the size of cache deployments.

OSDI '25 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {308706,
author = {Robert Lyerly and Scott Pruett and Kevin Doherty and Greg Rogers and Nathan Bronson and John Hugg},
title = {Skybridge: Bounded Staleness for Distributed Caches},
booktitle = {19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25)},
year = {2025},
isbn = {978-1-939133-47-2},
address = {Boston, MA},
pages = {187--204},
url = {https://www.usenix.org/conference/osdi25/presentation/lyerly},
publisher = {USENIX Association},
month = jul
}

Presentation Video