VMC：使用时间注意力调整的视频运动定制，用于文本到视频扩散模型

摘要

文本到视频扩散模型显著推动了视频生成的发展。然而，定制这些模型以生成具有定制动作的视频存在重大挑战。具体而言，它们在以下方面遇到困难：(a) 准确复制目标视频中的运动，以及 (b) 创建多样化的视觉变化。例如，将静态图像定制方法直接扩展到视频往往会导致外观和运动数据的错综复杂。为了解决这个问题，我们在这里提出了视频运动定制（VMC）框架，这是一种新颖的一次性调整方法，旨在调整视频扩散模型中的时间注意力层。我们的方法引入了一种新颖的运动蒸馏目标，使用连续帧之间的残差向量作为运动参考。然后，扩散过程在图像空间中保留低频运动轨迹，同时减轻高频运动无关的噪音。我们通过在不同真实世界的运动和背景下与最先进的视频生成模型进行验证，证实了我们的方法。我们的代码、数据和项目演示可在https://video-motion-customization.github.io 找到。

English

Text-to-video diffusion models have advanced video generation significantly. However, customizing these models to generate videos with tailored motions presents a substantial challenge. In specific, they encounter hurdles in (a) accurately reproducing motion from a target video, and (b) creating diverse visual variations. For example, straightforward extensions of static image customization methods to video often lead to intricate entanglements of appearance and motion data. To tackle this, here we present the Video Motion Customization (VMC) framework, a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference. The diffusion process then preserves low-frequency motion trajectories while mitigating high-frequency motion-unrelated noise in image space. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts. Our codes, data and the project demo can be found at https://video-motion-customization.github.io

VMC：使用时间注意力调整的视频运动定制，用于文本到视频扩散模型

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

摘要

Support