USENIX supports diversity, equity, and inclusion and condemns hate and discrimination.
Biblio
Export 8 results:
Filters: Author is Yuxiong He [Clear All Filters]
Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs. 2024 USENIX Annual Technical Conference (USENIX ATC 24). :699--713.
.
2024. ZeRO-Offload: Democratizing Billion-Scale Model Training. 2021 USENIX Annual Technical Conference (USENIX ATC 21). :551--564.
.
2021. Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft. 2019 USENIX Conference on Operational Machine Learning (OpML 19). :5--7.
.
2019. Deep Learning Inference Service at Microsoft. 2019 USENIX Conference on Operational Machine Learning (OpML 19). :15--17.
.
2019. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster. 2018 USENIX Annual Technical Conference (USENIX ATC 18). :951--965.
.
2018. A Theoretical Foundation for Scheduling and Designing Heterogeneous Processors for Interactive Applications. 11th International Conference on Autonomic Computing ({ICAC} 14).
.
2014. Exploiting Processor Heterogeneity in Interactive Services. 10th International Conference on Autonomic Computing (ICAC 13). :45--58.
.
2013. Performance Inconsistency in Large Scale Data Processing Clusters. 10th International Conference on Autonomic Computing (ICAC 13). :297--302.
.
2013.