Taming {Throughput-Latency} Tradeoff in {LLM} Inference with {Sarathi-Serve}
.
2024. Taming {Throughput-Latency} Tradeoff in {LLM} Inference with {Sarathi-Serve}. 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). :117--134.