Biblio

Export 1 results:
Filters: Author is Alireza Ghaffarkhah  [Clear All Filters]
2024
Zu Y, Ghaffarkhah A, Dang H-V, Towles B, Hand S, Huda S, Bello A, Kolbasov A, Rezaei A, Du D et al..  2024.  Resiliency at Scale: Managing Google’s TPUv4 Machine Learning Supercomputer. 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). :761--774.