Optimal Software Pipelining and Warp Specialization for Tensor Core GPUs

Rupanshu Soi, Rohan Yadav, Fredrik Kjolstad, and Alex Aiken, Stanford University; Maryam Mehri Dehnavi, Michael Garland, and Michael Bauer, NVIDIA