Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models

TitleCheck-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models
Publication TypeConference Paper
Year of Publication2022
AuthorsEisenman A, Matam KKumar, Ingram S, Mudigere D, Krishnamoorthi R, Nair K, Smelyanskiy M, Annavaram M
Conference Name19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)
Date Published04/2022
PublisherUSENIX Association
Conference LocationRenton, WA
ISBN Number978-1-939133-27-4
URLhttps://www.usenix.org/conference/nsdi22/presentation/eisenman