Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters


Qinghao Hu, Nanyang Technological University, S-Lab, NTU, and Shanghai AI Laboratory; Zhisheng Ye, Shanghai AI Laboratory and Peking University; Meng Zhang, Nanyang Technological University, S-Lab, NTU, and Shanghai AI Laboratory; Qiaoling Chen, Shanghai AI Laboratory and National University of Singapore; Peng Sun, Shanghai AI Laboratory and SenseTime Research; Yonggang Wen and Tianwei Zhang, Nanyang Technological University


Hyperparameter tuning is an essential step in deep learning model development that provides better model performance at the cost of substantial resources. While existing systems can improve tuning efficiency, they still fail to handle large models with billions of parameters and efficiently leverage cluster resources. Motivated by these deficiencies, we introduce Hydro, a surrogate-based hyperparameter tuning service that optimizes tuning workloads in both the job-level and cluster-level granularities. Specifically, it consists of two key components: (1) Hydro Tuner automatically generates and optimizes surrogate models via scaling, parametrization and fusion; (2) Hydro Coordinator improves tuning efficiency and cluster-wide resource utilization by adaptively leveraging ephemeral and heterogeneous resources. Our comprehensive experiments on two tuning algorithms across six models show that Hydro Tuner can dramatically reduce tuning makespan by up to 78.5x compared with Ray Tune and no reduction in tuning quality. Hydro's source code is publicly available at https://github.com/S-Lab-System-Group/Hydro.

OSDI '23 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {288566,
author = {Qinghao Hu and Zhisheng Ye and Meng Zhang and Qiaoling Chen and Peng Sun and Yonggang Wen and Tianwei Zhang},
title = {Hydro: {Surrogate-Based} Hyperparameter Tuning Service in Datacenters},
booktitle = {17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)},
year = {2023},
isbn = {978-1-939133-34-2},
address = {Boston, MA},
pages = {757--777},
url = {https://www.usenix.org/conference/osdi23/presentation/hu},
publisher = {USENIX Association},
month = jul

Presentation Video