DragVideo：交互式拖拽式视频编辑

摘要

在视频中编辑视觉内容仍然是一个巨大的挑战，主要问题有两个：1）直接且易于用户控制以产生2）在改变形状、表情和布局后不会出现难看的失真和伪影的自然编辑结果。受DragGAN启发，这是一种最近基于图像的拖拽式编辑技术，我们通过提出DragVideo来解决上述问题，其中采用类似的拖拽式用户交互来编辑视频内容，同时保持时间上的一致性。借助最近的扩散模型，如DragDiffusion，DragVideo包含了新颖的Drag-on-Video U-Net（DoVe）编辑方法，该方法通过优化视频U-Net生成的扩散视频潜变量来实现所需的控制。具体来说，我们使用样本特定的LoRA微调和相互自注意力控制，以确保从DoVe方法中忠实地重建视频。我们还提供了一系列拖拽式视频编辑的测试示例，并在各种具有挑战性的编辑任务中进行了广泛的实验，如运动编辑、骨架编辑等，突显了DragVideo的多功能性和普适性。我们将发布包括DragVideo网络用户界面在内的代码。

English

Editing visual content on videos remains a formidable challenge with two main issues: 1) direct and easy user control to produce 2) natural editing results without unsightly distortion and artifacts after changing shape, expression and layout. Inspired by DragGAN, a recent image-based drag-style editing technique, we address above issues by proposing DragVideo, where a similar drag-style user interaction is adopted to edit video content while maintaining temporal consistency. Empowered by recent diffusion models as in DragDiffusion, DragVideo contains the novel Drag-on-Video U-Net (DoVe) editing method, which optimizes diffused video latents generated by video U-Net to achieve the desired control. Specifically, we use Sample-specific LoRA fine-tuning and Mutual Self-Attention control to ensure faithful reconstruction of video from the DoVe method. We also present a series of testing examples for drag-style video editing and conduct extensive experiments across a wide array of challenging editing tasks, such as motion editing, skeleton editing, etc, underscoring DragVideo's versatility and generality. Our codes including the DragVideo web user interface will be released.

DragVideo：交互式拖拽式视频编辑

DragVideo: Interactive Drag-style Video Editing

摘要

Support