ChatPaper.aiChatPaper

LoRA-Edit:基于掩码感知LoRA微调的首帧引导可控视频编辑

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

June 11, 2025
作者: Chenjian Gao, Lihe Ding, Xin Cai, Zhanpeng Huang, Zibin Wang, Tianfan Xue
cs.AI

摘要

利用扩散模型进行视频编辑在生成高质量视频编辑方面已取得显著成果。然而,现有方法通常依赖于大规模预训练,限制了特定编辑的灵活性。首帧引导编辑虽能控制首帧,但对后续帧的灵活性不足。为此,我们提出了一种基于掩码的LoRA(低秩适应)调优方法,通过调整预训练的图像到视频(I2V)模型来实现灵活的视频编辑。该方法在保留背景区域的同时,实现了可控的编辑传播,提供了一种高效且适应性强的视频编辑方案,而无需改变模型架构。为了更好地引导这一过程,我们引入了额外参考,如不同视角或代表性场景状态,作为内容展开的视觉锚点。我们采用掩码驱动的LoRA调优策略来解决控制难题,使预训练的I2V模型适应编辑上下文。模型需从两个不同来源学习:输入视频提供空间结构和运动线索,而参考图像则提供外观指导。空间掩码通过动态调节模型关注点,实现区域特定学习,确保每个区域从适当来源汲取信息。实验结果表明,与最先进方法相比,我们的方法在视频编辑性能上表现更优。
English
Video editing using diffusion models has achieved remarkable results in generating high-quality edits for videos. However, current methods often rely on large-scale pretraining, limiting flexibility for specific edits. First-frame-guided editing provides control over the first frame, but lacks flexibility over subsequent frames. To address this, we propose a mask-based LoRA (Low-Rank Adaptation) tuning method that adapts pretrained Image-to-Video (I2V) models for flexible video editing. Our approach preserves background regions while enabling controllable edits propagation. This solution offers efficient and adaptable video editing without altering the model architecture. To better steer this process, we incorporate additional references, such as alternate viewpoints or representative scene states, which serve as visual anchors for how content should unfold. We address the control challenge using a mask-driven LoRA tuning strategy that adapts a pre-trained image-to-video model to the editing context. The model must learn from two distinct sources: the input video provides spatial structure and motion cues, while reference images offer appearance guidance. A spatial mask enables region-specific learning by dynamically modulating what the model attends to, ensuring that each area draws from the appropriate source. Experimental results show our method achieves superior video editing performance compared to state-of-the-art methods.
PDF42June 16, 2025