Contextra: Hierarchical Context Caching for Long Context Language Model Serving

Zhiqiang Xie, Stanford University; Ziyi Xu, Shanghai Jiao Tong University; Mark Zhao, University of Colorado Boulder; Yuwei An, Carnegie Mellon University; Vikram Sharma Mailthody, Nvidia; Scott Mahlke, University of Michigan & Nvidia Research; Michael Garland, NVIDIA; Christos Kozyrakis, Stanford University & Nvidia Research