Kairox: Adaptive GPU-CPU Hybrid LLM Inference via Online Neuron Balancing

Yapeng Jiang and Minghao Gan, Sun Yat-sen University; Zicong Hong, Hong Kong University of Science and Technology; Wuhui Chen and Junyuan Liang, Sun Yat-sen University; Yue Yu, Pengcheng Laboratory; Meng Guo, Qilu University of Technology; Zibin Zheng, Sun Yat-sen University