ChatPaper.aiChatPaper

MotionDirector:文本到视频扩散模型的运动定制

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

October 12, 2023
作者: Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou
cs.AI

摘要

大规模预训练扩散模型在各种视频生成中展现出卓越的能力。给定一组具有相同运动概念的视频剪辑,运动定制的任务是调整现有的文本到视频扩散模型,以生成具有这种运动的视频。例如,生成一段视频,其中汽车按照规定的方式移动,在特定的摄像机运动下制作电影,或者展示一只熊如何举重以激发创作者。已经开发了用于定制外观(如主题或风格)的适应方法,但对于运动尚未探索。可以直接扩展主流适应方法以进行运动定制,包括完整模型调整、额外层的参数高效调整以及低秩适应(LoRAs)。然而,这些方法学习的运动概念通常与训练视频中的有限外观相耦合,使得将定制的运动推广到其他外观变得困难。为了克服这一挑战,我们提出了MotionDirector,采用双通道LoRAs架构来解耦外观和运动的学习。此外,我们设计了一种新颖的外观去偏差时间损失,以减轻外观对时间训练目标的影响。实验结果表明,所提出的方法可以为定制的运动生成具有多样外观的视频。我们的方法还支持各种下游应用,例如混合不同视频的外观和运动,以及为单个图像赋予定制运动的动画化。我们将发布代码和模型权重。
English
Large-scale pre-trained diffusion models have exhibited remarkable capabilities in diverse video generations. Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion. For example, generating a video with a car moving in a prescribed manner under specific camera movements to make a movie, or a video illustrating how a bear would lift weights to inspire creators. Adaptation methods have been developed for customizing appearance like subject or style, yet unexplored for motion. It is straightforward to extend mainstream adaption methods for motion customization, including full model tuning, parameter-efficient tuning of additional layers, and Low-Rank Adaptions (LoRAs). However, the motion concept learned by these methods is often coupled with the limited appearances in the training videos, making it difficult to generalize the customized motion to other appearances. To overcome this challenge, we propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion. Further, we design a novel appearance-debiased temporal loss to mitigate the influence of appearance on the temporal training objective. Experimental results show the proposed method can generate videos of diverse appearances for the customized motions. Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions. Our code and model weights will be released.
PDF165December 15, 2024