{MegaScale}: Scaling Large Language Model Training to More Than 10,000 {GPUs}
.
2024. {MegaScale}: Scaling Large Language Model Training to More Than 10,000 {GPUs}. 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). :745--760.