Baleen: ML Admission & Prefetching for Flash Caches


Daniel Lin-Kit Wong, Carnegie Mellon University; Hao Wu, Meta; Carson Molder, UT Austin; Sathya Gunasekar, Jimmy Lu, Snehal Khandkar, and Abhinav Sharma, Meta; Daniel S. Berger, Microsoft and University of Washington; Nathan Beckmann and Gregory R. Ganger, Carnegie Mellon University


Flash caches are used to reduce peak backend load for throughput-constrained data center services, reducing the total number of backend servers required. Bulk storage systems are a large-scale example, backed by high-capacity but low-throughput hard disks, and using flash caches to provide a more cost-effective storage layer underlying everything from blobstores to data warehouses.

However, flash caches must address the limited write endurance of flash by limiting the long-term average flash write rate to avoid premature wearout. To do so, most flash caches must use admission policies to filter cache insertions and maximize the workload-reduction value of each flash write.

The Baleen flash cache uses coordinated ML admission and prefetching to reduce peak backend load. After learning painful lessons with our early ML policy attempts, we exploit a new cache residency model (which we call episodes) to guide model training. We focus on optimizing for an end-to-end system metric (Disk-head Time) that measures backend load more accurately than IO miss rate or byte miss rate. Evaluation using Meta traces from seven storage clusters shows that Baleen reduces Peak Disk-head Time (and hence the number of backend hard disks required) by 12% over state-of-the-art policies for a fixed flash write rate constraint. Baleen-TCO, which chooses an optimal flash write rate, reduces our estimated total cost of ownership (TCO) by 17%. Code and traces are available at

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {294815,
author = {Daniel Lin-Kit Wong and Hao Wu and Carson Molder and Sathya Gunasekar and Jimmy Lu and Snehal Khandkar and Abhinav Sharma and Daniel S. Berger and Nathan Beckmann and Gregory R. Ganger},
title = {Baleen: {ML} Admission \& Prefetching for Flash Caches},
booktitle = {22nd USENIX Conference on File and Storage Technologies (FAST 24)},
year = {2024},
isbn = {978-1-939133-38-0},
address = {Santa Clara, CA},
pages = {347--371},
url = {},
publisher = {USENIX Association},
month = feb

Presentation Video