Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU–GPU Hybrid Design

Wenxin Wang, Tsinghua University; Yule Hou and Yu Ji, Xingyun; Peng Qu and Youhui Zhang, Tsinghua University