ChatPaper.aiChatPaper

MotionDirector:文本到視頻擴散模型的動態定制

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

October 12, 2023
作者: Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou
cs.AI

摘要

大規模預訓練擴散模型在各種視頻生成方面展現出卓越的能力。給定一組相同運動概念的視頻片段,運動定制的任務是適應現有的文本到視頻擴散模型,以生成具有這種運動的視頻。例如,生成一部汽車按照特定的方式移動並在特定的攝像機運動下製作電影,或者展示一隻熊如何舉重以激發創作者。已經開發了用於定制外觀(如主題或風格)的適應方法,但對於運動尚未探索。將主流適應方法擴展到運動定制是直觀的,包括完整模型調整、額外層的參數高效調整以及低秩適應(LoRAs)。然而,這些方法學習的運動概念通常與訓練視頻中的有限外觀相耦合,使得將定制的運動概念推廣到其他外觀變得困難。為了克服這一挑戰,我們提出了MotionDirector,採用雙路徑LoRAs架構來解耦外觀和運動的學習。此外,我們設計了一種新穎的外觀去偏差時間損失,以減輕外觀對時間訓練目標的影響。實驗結果表明,所提出的方法可以生成具有多樣外觀的定制運動視頻。我們的方法還支持各種下游應用,例如將不同視頻的外觀和運動分別混合,以及對單張圖像進行定制運動動畫化。我們的代碼和模型權重將被釋出。
English
Large-scale pre-trained diffusion models have exhibited remarkable capabilities in diverse video generations. Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion. For example, generating a video with a car moving in a prescribed manner under specific camera movements to make a movie, or a video illustrating how a bear would lift weights to inspire creators. Adaptation methods have been developed for customizing appearance like subject or style, yet unexplored for motion. It is straightforward to extend mainstream adaption methods for motion customization, including full model tuning, parameter-efficient tuning of additional layers, and Low-Rank Adaptions (LoRAs). However, the motion concept learned by these methods is often coupled with the limited appearances in the training videos, making it difficult to generalize the customized motion to other appearances. To overcome this challenge, we propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion. Further, we design a novel appearance-debiased temporal loss to mitigate the influence of appearance on the temporal training objective. Experimental results show the proposed method can generate videos of diverse appearances for the customized motions. Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions. Our code and model weights will be released.
PDF165December 15, 2024