ChatPaper.aiChatPaper

EditCtrl:实现实时生成式视频编辑的局部与全局解耦控制

EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing

February 16, 2026
作者: Yehonathan Litman, Shikun Liu, Dario Seyb, Nicholas Milef, Yang Zhou, Carl Marshall, Shubham Tulsiani, Caleb Leak
cs.AI

摘要

高保真度生成式视频编辑通过利用预训练视频基础模型已实现显著的质量提升。然而,其计算成本仍是主要瓶颈——现有方法通常低效处理完整视频上下文,即使面对稀疏的局部编辑任务也不例外。本文提出EditCtrl,一种高效视频修复控制框架,可将计算资源精准聚焦于待编辑区域。我们的方案采用创新的局部视频上下文模块,该模块仅对掩码标记进行操作,使计算成本与编辑范围成正比。这种局部优先的生成过程由轻量级时序全局上下文编码器引导,能以最小开销确保视频整体上下文的一致性。EditCtrl不仅比当前最先进的生成式编辑方法提升10倍计算效率,其编辑质量甚至优于采用全局注意力机制的方案。最后,我们展示了EditCtrl如何解锁包括多区域文本提示编辑和自回归内容传播在内的新功能。
English
High-fidelity generative video editing has seen significant quality improvements by leveraging pre-trained video foundation models. However, their computational cost is a major bottleneck, as they are often designed to inefficiently process the full video context regardless of the inpainting mask's size, even for sparse, localized edits. In this paper, we introduce EditCtrl, an efficient video inpainting control framework that focuses computation only where it is needed. Our approach features a novel local video context module that operates solely on masked tokens, yielding a computational cost proportional to the edit size. This local-first generation is then guided by a lightweight temporal global context embedder that ensures video-wide context consistency with minimal overhead. Not only is EditCtrl 10 times more compute efficient than state-of-the-art generative editing methods, it even improves editing quality compared to methods designed with full-attention. Finally, we showcase how EditCtrl unlocks new capabilities, including multi-region editing with text prompts and autoregressive content propagation.
PDF12February 18, 2026