DragVideo:交互式拖拽式视频编辑
DragVideo: Interactive Drag-style Video Editing
December 3, 2023
作者: Yufan Deng, Ruida Wang, Yuhao Zhang, Yu-Wing Tai, Chi-Keung Tang
cs.AI
摘要
在视频中编辑视觉内容仍然是一个巨大的挑战,主要问题有两个:1)直接且易于用户控制以产生2)在改变形状、表情和布局后不会出现难看的失真和伪影的自然编辑结果。受DragGAN启发,这是一种最近基于图像的拖拽式编辑技术,我们通过提出DragVideo来解决上述问题,其中采用类似的拖拽式用户交互来编辑视频内容,同时保持时间上的一致性。借助最近的扩散模型,如DragDiffusion,DragVideo包含了新颖的Drag-on-Video U-Net(DoVe)编辑方法,该方法通过优化视频U-Net生成的扩散视频潜变量来实现所需的控制。具体来说,我们使用样本特定的LoRA微调和相互自注意力控制,以确保从DoVe方法中忠实地重建视频。我们还提供了一系列拖拽式视频编辑的测试示例,并在各种具有挑战性的编辑任务中进行了广泛的实验,如运动编辑、骨架编辑等,突显了DragVideo的多功能性和普适性。我们将发布包括DragVideo网络用户界面在内的代码。
English
Editing visual content on videos remains a formidable challenge with two main
issues: 1) direct and easy user control to produce 2) natural editing results
without unsightly distortion and artifacts after changing shape, expression and
layout. Inspired by DragGAN, a recent image-based drag-style editing technique,
we address above issues by proposing DragVideo, where a similar drag-style user
interaction is adopted to edit video content while maintaining temporal
consistency. Empowered by recent diffusion models as in DragDiffusion,
DragVideo contains the novel Drag-on-Video U-Net (DoVe) editing method, which
optimizes diffused video latents generated by video U-Net to achieve the
desired control. Specifically, we use Sample-specific LoRA fine-tuning and
Mutual Self-Attention control to ensure faithful reconstruction of video from
the DoVe method. We also present a series of testing examples for drag-style
video editing and conduct extensive experiments across a wide array of
challenging editing tasks, such as motion editing, skeleton editing, etc,
underscoring DragVideo's versatility and generality. Our codes including the
DragVideo web user interface will be released.