Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Recent streaming video generation methods are mostly developed for synthesis and cannot be directly applied to editing due to strict preservation requirements and region-specific control. We present LiveEdit, a streaming video editing framework that performs causal, frame-by-frame editing with strong content preservation and real-time responsiveness. LiveEdit uses a three-stage distillation pipeline to transfer editing capability from a powerful bidirectional foundation model to an efficient unidirectional streaming editor. To further support real-time deployment, an AR-oriented mask cache reuses region-related computation across frames, reducing redundant processing while preserving visual quality. Extensive evaluations show state-of-the-art quality among streaming baselines and 12.66 FPS inference speed.
Representative real-time editing cases with source video, target video, and editing instruction.
@inproceedings{wang2026liveedit,
title={LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing},
author={Wang, Xinyu and Zhao, Chongbo and Zhan, Fangneng and Ma, Yue},
booktitle={European Conference on Computer Vision},
year={2026}
organization={Springer}
}