Morphe: High-Fidelity Generative Video Streaming with Vision Foundation Model

Tianyi Gong, Zijian Cao, and Zixing Zhang, The Chinese University of Hong Kong, Shenzhen, and Shenzhen Future Network of Intelligence Institute; Jiangkai Wu and Xinggong Zhang, Peking University; Shuguang Cui and Fangxin Wang, The Chinese University of Hong Kong, Shenzhen, and Shenzhen Future Network of Intelligence Institute

Video streaming is a fundamental Internet service, while the quality still cannot be guaranteed especially in poor network conditions such as bandwidth-constrained and remote areas. Existing works mainly work towards two directions: traditional pixel-codec streaming nearly approaches its limit and is hard to step further in compression; the emerging neural-enhanced or generative streaming usually fall short in latency and visual fidelity, hindering their practical deployment.

Inspired by the recent success of vision foundation model (VFM), we strive to harness the powerful video understanding and processing capacities of VFM to achieve generalization, high fidelity and loss resilience for real-time video streaming with even higher compression rate. We present Morphe, the first revolutionized paradigm that enables VFM-based end-to-end generative video streaming towards this goal. Specifically, Morphe employs joint training of visual tokenizers and variable-resolution spatiotemporal optimization under simulated network constraints. Additionally, a robust streaming system is constructed that leverages intelligent packet dropping to resist real-world network perturbations. Extensive evaluation demonstrates that Morphe achieves comparable visual quality while saving 62.5% bandwidth compared to H.265, and accomplishes real-time, loss-resilient video delivery in challenging network environments, representing a milestone in VFM-enabled multimedia streaming solutions.

NSDI '26 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {316606,
author = {Tianyi Gong and Zijian Cao and Zixing Zhang and Jiangkai Wu and Xinggong Zhang and Shuguang Cui and Fangxin Wang},
title = {Morphe: {High-Fidelity} Generative Video Streaming with Vision Foundation Model},
booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)},
year = {2026},
isbn = {978-1-939133-54-0},
address = {Renton, WA},
pages = {301--317},
url = {https://www.usenix.org/conference/nsdi26/presentation/gong},
publisher = {USENIX Association},
month = may
}

Presentation Video