USENIX supports diversity, equity, and inclusion and condemns hate and discrimination.
Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention
Submitted by admin on May 9, 2024 - 3:16 pm
Title | Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention |
Publication Type | Conference Paper |
Year of Publication | 2024 |
Authors | Gao B, He Z, Sharma P, Kang Q, Jevdjic D, Deng J, Yang X, Yu Z, Zuo P |
Conference Name | 2024 USENIX Annual Technical Conference (USENIX ATC 24) |
Date Published | 07/2024 |
Publisher | USENIX Association |
Conference Location | Santa Clara, CA |
ISBN Number | 978-1-939133-41-0 |
URL | https://www.usenix.org/conference/atc24/presentation/gao-bin-cost |