{AdaEmbed}: Adaptive Embedding for {Large-Scale} Recommendation Models

Fan Lai; Wei Zhang; Rui Liu; William Tsai; Xiaohan Wei; Yuxi Hu; Sabin Devkota; Jianyu Huang; Jongsoo Park; Xing Liu; Zeliang Chen; Ellie Wen; Paul Rivera; Jie You; Chun-cheng Jason Chen; Mosharaf Chowdhury

Fan Lai, University of Michigan; Wei Zhang, Rui Liu, William Tsai, Xiaohan Wei, Yuxi Hu, Sabin Devkota, Jianyu Huang, Jongsoo Park, Xing Liu, Zeliang Chen, Ellie Wen, Paul Rivera, Jie You, and Chun-cheng Jason Chen, Meta; Mosharaf Chowdhury, University of Michigan

Deep learning recommendation models (DLRMs) are using increasingly larger embedding tables to represent categorical sparse features such as video genres. Each embedding row of the table represents the trainable weight vector for a specific instance of that feature. While increasing the number of embedding rows typically improves model accuracy by considering more feature instances, it can lead to larger deployment costs and slower model execution.

Unlike existing efforts that primarily focus on optimizing DLRMs for the given embedding, we present a complementary system, AdaEmbed, to reduce the size of embeddings needed for the same DLRM accuracy via in-training embedding pruning. Our key insight is that the access patterns and weights of different embeddings are heterogeneous across embedding rows, and dynamically change over the training process, implying varying embedding importance with respect to model accuracy. However, identifying important embeddings and then enforcing pruning for modern DLRMs with up to billions of embeddings (terabytes) is challenging. Given the total embedding size, AdaEmbed considers embeddings with higher runtime access frequencies and larger training gradients to be more important, and it dynamically prunes less important embeddings at scale to automatically determine per-feature embeddings. Our evaluations in industrial settings show that AdaEmbed saves 35-60% embedding size needed in deployment and improves model execution speed by 11-34%, while achieving noticeable accuracy gains.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {288570,
author = {Fan Lai and Wei Zhang and Rui Liu and William Tsai and Xiaohan Wei and Yuxi Hu and Sabin Devkota and Jianyu Huang and Jongsoo Park and Xing Liu and Zeliang Chen and Ellie Wen and Paul Rivera and Jie You and Chun-cheng Jason Chen and Mosharaf Chowdhury},
title = {{AdaEmbed}: Adaptive Embedding for {Large-Scale} Recommendation Models},
booktitle = {17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)},
year = {2023},
isbn = {978-1-939133-34-2},
address = {Boston, MA},
pages = {817--831},
url = {https://www.usenix.org/conference/osdi23/presentation/lai},
publisher = {USENIX Association},
month = jul
}

Download

Lai PDF

AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models

Open Access Media

Presentation Video