VMC:使用時間注意力適應進行視訊動態定制,用於文本到視訊擴散模型
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
December 1, 2023
作者: Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye
cs.AI
摘要
文字到視頻擴散模型已顯著推動視頻生成的進步。
然而,定制這些模型以生成具有定制動作的視頻存在著重大挑戰。
具體而言,它們在(a) 準確復制目標視頻中的運動和(b) 創建多樣視覺變化方面遇到困難。
例如,將靜態圖像定制方法直接擴展到視頻往往導致外觀和運動數據的複雜交織。
為了應對這一挑戰,我們在這裡提出了視頻運動定制(VMC)框架,這是一種新穎的一次性調整方法,旨在調整視頻擴散模型中的時間注意層。
我們的方法引入了一種新穎的運動提煉目標,使用連續幀之間的殘差向量作為運動參考。
然後,擴散過程在圖像空間中保留低頻運動軌跡,同時減輕高頻運動無關噪音。
我們通過在不同現實世界運動和情境中對最先進的視頻生成模型進行驗證,證實了我們的方法。
我們的代碼、數據和項目演示可在以下網址找到:https://video-motion-customization.github.io
English
Text-to-video diffusion models have advanced video generation significantly.
However, customizing these models to generate videos with tailored motions
presents a substantial challenge. In specific, they encounter hurdles in (a)
accurately reproducing motion from a target video, and (b) creating diverse
visual variations. For example, straightforward extensions of static image
customization methods to video often lead to intricate entanglements of
appearance and motion data. To tackle this, here we present the Video Motion
Customization (VMC) framework, a novel one-shot tuning approach crafted to
adapt temporal attention layers within video diffusion models. Our approach
introduces a novel motion distillation objective using residual vectors between
consecutive frames as a motion reference. The diffusion process then preserves
low-frequency motion trajectories while mitigating high-frequency
motion-unrelated noise in image space. We validate our method against
state-of-the-art video generative models across diverse real-world motions and
contexts. Our codes, data and the project demo can be found at
https://video-motion-customization.github.io