MagicProp:基于扩散的视频编辑,通过运动感知外观传播。
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation
September 2, 2023
作者: Hanshu Yan, Jun Hao Liew, Long Mai, Shanchuan Lin, Jiashi Feng
cs.AI
摘要
本文讨论了在保留视频运动的同时修改视频外观的问题。提出了一种名为MagicProp的新颖框架,将视频编辑过程分解为两个阶段:外观编辑和运动感知外观传播。在第一阶段,MagicProp从输入视频中选择单帧并应用图像编辑技术来修改帧的内容和/或风格。这些技术的灵活性使得可以编辑帧内的任意区域。在第二阶段,MagicProp使用编辑后的帧作为外观参考,并利用自回归渲染方法生成其余帧。为实现这一目标,开发了基于扩散的条件生成模型PropDPM,通过在参考外观、目标运动及其先前外观的条件下合成目标帧。自回归编辑方法确保了生成视频中的时间一致性。总体而言,MagicProp结合了图像编辑技术的灵活性和自回归建模的出色时间一致性,实现了对输入视频中任意区域的对象类型和美学风格的灵活编辑,同时保持帧间良好的时间一致性。在各种视频编辑场景中进行的大量实验证明了MagicProp的有效性。
English
This paper addresses the issue of modifying the visual appearance of videos
while preserving their motion. A novel framework, named MagicProp, is proposed,
which disentangles the video editing process into two stages: appearance
editing and motion-aware appearance propagation. In the first stage, MagicProp
selects a single frame from the input video and applies image-editing
techniques to modify the content and/or style of the frame. The flexibility of
these techniques enables the editing of arbitrary regions within the frame. In
the second stage, MagicProp employs the edited frame as an appearance reference
and generates the remaining frames using an autoregressive rendering approach.
To achieve this, a diffusion-based conditional generation model, called
PropDPM, is developed, which synthesizes the target frame by conditioning on
the reference appearance, the target motion, and its previous appearance. The
autoregressive editing approach ensures temporal consistency in the resulting
videos. Overall, MagicProp combines the flexibility of image-editing techniques
with the superior temporal consistency of autoregressive modeling, enabling
flexible editing of object types and aesthetic styles in arbitrary regions of
input videos while maintaining good temporal consistency across frames.
Extensive experiments in various video editing scenarios demonstrate the
effectiveness of MagicProp.