Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training

Website Maintenance Alert

Due to scheduled maintenance, the USENIX website will not be available on Saturday, April 13, from 12:00 am–12:30 am Pacific Daylight Time (UTC-7). We apologize for the inconvenience.

If you are trying to register for NSDI '24 or register for PEPR '24, please complete your registration before or after this time period.

Authors: 

Jie You, Jae-Won Chung, and Mosharaf Chowdhury, University of Michigan

Abstract: 

Training deep neural networks (DNNs) is becoming increasingly more resource- and energy-intensive every year. Unfortunately, existing works primarily focus on optimizing DNN training for faster completion, often without considering the impact on energy efficiency.

In this paper, we observe that common practices to improve training performance can often lead to inefficient energy usage. More importantly, we demonstrate that there is a tradeoff between energy consumption and performance optimization. To this end, we propose Zeus, an optimization framework to navigate this tradeoff by automatically finding optimal job- and GPU-level configurations for recurring DNN training jobs. Zeus uses an online exploration-exploitation approach in conjunction with just-in-time energy profiling, averting the need for expensive offline measurements, while adapting to data drifts over time. Our evaluation shows that Zeus can improve the energy efficiency of DNN training by 15.3%–75.8% for diverse workloads.

NSDI '23 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

BibTeX
@inproceedings {285082,
author = {Jie You and Jae-Won Chung and Mosharaf Chowdhury},
title = {Zeus: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training},
booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)},
year = {2023},
isbn = {978-1-939133-33-5},
address = {Boston, MA},
pages = {119--139},
url = {https://www.usenix.org/conference/nsdi23/presentation/you},
publisher = {USENIX Association},
month = apr
}
You Paper (Prepublication) PDF

Presentation Video