Inference in the Shadows: Taming Memory Bandwidth Contention in Mobile LLM Inference with Sereno

Tong Xin, Xinrui Shi, Mingkai Dong, and Zeyu Mi, Shanghai Jiao Tong University