Biblio

Export 6 results:

DBLP
BibTeX

Filters: Author is Hao Zhang [Clear All Filters]

2024

Zhong Y, Liu S, Chen J, Hu J, Zhu Y, Liu X, Jin X, Zhang H. 2024. DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving. 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). :193--210.

2023

Li Z, Zheng L, Zhong Y, Liu V, Sheng Y, Jin X, Huang Y, Chen Z, Zhang H, Gonzalez JE et al.. 2023. AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving. 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). :663--679.

2022

Zheng L, Li Z, Zhang H, Zhuang Y, Chen Z, Huang Y, Wang Y, Xu Y, Zhuo D, Xing EP et al.. 2022. Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning. 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). :559--578.

2021

Qiao A, Choe SKeun, Subramanya SJayaram, Neiswanger W, Ho Q, Zhang H, Ganger GR, Xing EP. 2021. Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning. 15th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 21). :1--18.

2018

Xu S, Zhang H, Neubig G, Dai W, Kim JKyu, Deng Z, Ho Q, Yang G, Xing EP. 2018. Cavs: An Efficient Runtime System for Dynamic Neural Networks. 2018 USENIX Annual Technical Conference (USENIX ATC 18). :937--950.

2017

Zhang H, Zheng Z, Xu S, Dai W, Ho Q, Liang X, Hu Z, Wei J, Xie P, Xing EP. 2017. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters. 2017 USENIX Annual Technical Conference (USENIX ATC 17). :181--193.