Search results

    TitleConferenceSpeaker(s)
    PPipe: Efficient Video Analytics Serving on Heterogeneous GPU Clusters via Pool-Based Pipeline ParallelismUSENIX ATC '25Z. Jonny Kong, Qiang Xu, Y. Charlie Hu
    Voltrix: Sparse Matrix-Matrix Multiplication on Tensor Cores with Asynchronous and Balanced Kernel OptimizationUSENIX ATC '25Yaqi Xia, Weihu Wang, Donglin Yang, Xiaobo Zhou, Dazhao Cheng
    GREYHOUND: Hunting Fail-Slows in Hybrid-Parallel Training at ScaleUSENIX ATC '25Tianyuan Wu, Wei Wang, Yinghao Yu, Siran Yang, Wenchao Wu, Qinkai Duan, Guodong Yang, Jiamang Wang, Lin Qu, Liping Zhang
    LEOCraft: Towards Designing Performant LEO NetworksUSENIX ATC '25Suvam Basak, Amitangshu Pal, Debopam Bhattacherjee
    Fast Distributed Transactions for RDMA-based Disaggregated MemoryUSENIX ATC '25Haodi Lu, Haikun Liu, Yujian Zhang, Zhuohui Duan, Xiaofei Liao, Hai Jin, Yu Zhang
    Katz: Efficient Workflow Serving for Diffusion Models with Many AdaptersUSENIX ATC '25Suyi Li, Lingyun Yang, Xiaoxiao Jiang, Hanfeng Lu, Dakai An, Zhipeng Di, Weiyi Lu, Jiawei Chen, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang
    CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter TrainingUSENIX ATC '25Tiancheng Chen, Ales Kubicek, Langwen Huang, Torsten Hoefler
    Unveiling Compiler Faults via Attribute-Guided Compilation Space ExplorationUSENIX ATC '25Jiangchang Wu, Yibiao Yang, Maolin Sun, Yuming Zhou
    Understanding and Detecting Fail-Slow Hardware Failure Bugs in Cloud SystemsUSENIX ATC '25Gen Dong, Yu Hua, Yongle Zhang, Zhangyu Chen, Menglei Chen
    Para-ksm: Parallelized Memory Deduplication with Data Streaming AcceleratorUSENIX ATC '25Houxiang Ji, Minho Kim, Seonmu Oh, Daehoon Kim, Nam Sung Kim
    DSA-2LM: A CPU-Free Tiered Memory Architecture with Intel DSAUSENIX ATC '25Ruili Liu, Teng Ma, Mingxing Zhang, Jialiang Huang, Yingdi Shan, Zheng Liu, Lingfeng Xiang, Zhen Lin, Hui Lu, Jia Rao, Kang Chen, Yongwei Wu
    Turbocharge ANNS on Real Processing-in-Memory by Enabling Fine-Grained Per-PIM-Core SchedulingUSENIX ATC '25Puqing Wu, Minhui Xie, Enrui Zhao, Dafang Zhang, Jing Wang, Xiao Liang, Kai Ren, Yunpeng Chai
    ShieldReduce: Fine-Grained Shielded Data ReductionUSENIX ATC '25Jingyuan Yang, Jun Wu, Ruilin Wu, Jingwei Li, Patrick P. C. Lee, Xiong Li, Xiaosong Zhang
    Separate but Together: Integrating Remote Attestation into TLSUSENIX ATC '25Carsten Weinhold, Muhammad Usama Sardar, Ionuț Mihalcea, Yogesh Deshpande, Hannes Tschofenig, Yaron Sheffer, Thomas Fossati, Michael Roitzsch
    SpaceExit: Enabling Efficient Adaptive Computing in Space with Early ExitsUSENIX ATC '25Jiacheng Liu, Xiaozhi Zhu, Tongqiao Xu, Xiaofeng Hou, Chao Li
    XRT: An Accelerator-Aware Runtime for Accelerated Chip MultiprocessorsUSENIX ATC '25Neel Patel, Mohammad Alian
    Revealing Floating-Point Accumulation Orders in Software/Hardware ImplementationsUSENIX ATC '25Peichen Xie, Yanjie Gao, Yang Wang, Jilong Xue
    IRHash: Efficient Multi-Language Compiler Caching by IR-Level HashingUSENIX ATC '25Tobias Landsberg, Johannes Grunenberg, Christian Dietrich, Daniel Lohmann
    On-Demand Container Partitioning for Distributed MLUSENIX ATC '25Giovanni Bartolomeo, Navidreza Asadi, Wolfgang Kellerer, Jorg Ott, Nitinder Mohan
    Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable ParallelismUSENIX ATC '25Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, Minjia Zhang
    SAVE: Software-Implemented Fault Tolerance for Model Inference against GPU Memory Bit FlipsUSENIX ATC '25Wenxin Zheng, Bin Xu, Jinyu Gu, Haibo Chen
    Resource Multiplexing in Tuning and Serving Large Language ModelsUSENIX ATC '25Yongjun He, Haofeng Yang, Yao Lu, Ana Klimovic, Gustavo Alonso
    Colocating ML Inference and Training with Fast GPU Memory HandoverUSENIX ATC '25Jiali Wang, Yankui Wang, Mingcong Han, Rong Chen
    Tigon: A Distributed Database for a CXL PodOSDI '25Yibo Huang, Haowei Chen, Newton Ni, Yan Sun, Vijay Chidambaram, Dixin Tang, Emmett Witchel
    Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSDOSDI '25Hao Guo, Youyou Lu

    Pages