VMC：使用時間注意力適應進行視訊動態定制，用於文本到視訊擴散模型

摘要

文字到視頻擴散模型已顯著推動視頻生成的進步。然而，定制這些模型以生成具有定制動作的視頻存在著重大挑戰。具體而言，它們在(a) 準確復制目標視頻中的運動和(b) 創建多樣視覺變化方面遇到困難。例如，將靜態圖像定制方法直接擴展到視頻往往導致外觀和運動數據的複雜交織。為了應對這一挑戰，我們在這裡提出了視頻運動定制（VMC）框架，這是一種新穎的一次性調整方法，旨在調整視頻擴散模型中的時間注意層。我們的方法引入了一種新穎的運動提煉目標，使用連續幀之間的殘差向量作為運動參考。然後，擴散過程在圖像空間中保留低頻運動軌跡，同時減輕高頻運動無關噪音。我們通過在不同現實世界運動和情境中對最先進的視頻生成模型進行驗證，證實了我們的方法。我們的代碼、數據和項目演示可在以下網址找到：https://video-motion-customization.github.io

English

Text-to-video diffusion models have advanced video generation significantly. However, customizing these models to generate videos with tailored motions presents a substantial challenge. In specific, they encounter hurdles in (a) accurately reproducing motion from a target video, and (b) creating diverse visual variations. For example, straightforward extensions of static image customization methods to video often lead to intricate entanglements of appearance and motion data. To tackle this, here we present the Video Motion Customization (VMC) framework, a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference. The diffusion process then preserves low-frequency motion trajectories while mitigating high-frequency motion-unrelated noise in image space. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts. Our codes, data and the project demo can be found at https://video-motion-customization.github.io

VMC：使用時間注意力適應進行視訊動態定制，用於文本到視訊擴散模型

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

摘要

Support