动态展台:动态感知定制文本到视频生成
MotionBooth: Motion-Aware Customized Text-to-Video Generation
June 25, 2024
作者: Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen
cs.AI
摘要
在这项工作中,我们提出了MotionBooth,这是一个创新性框架,旨在为定制主题的动画提供精确控制,涵盖对象和摄像机运动。通过利用特定对象的少量图像,我们有效地微调文本到视频模型,以准确捕捉对象的形状和属性。我们的方法提出了主题区域损失和视频保留损失,以增强主题的学习性能,同时引入主题标记交叉注意力损失,将定制主题与运动控制信号整合。此外,我们提出了训练无关的技术,用于在推断期间管理主题和摄像机运动。具体而言,我们利用交叉注意力地图操作来控制主题运动,并引入了一种新颖的潜在偏移模块,用于摄像机运动控制。MotionBooth在保留主题外观的同时,同时控制生成视频中的运动方面表现出色。广泛的定量和定性评估证明了我们方法的优越性和有效性。我们的项目页面位于https://jianzongwu.github.io/projects/motionbooth。
English
In this work, we present MotionBooth, an innovative framework designed for
animating customized subjects with precise control over both object and camera
movements. By leveraging a few images of a specific object, we efficiently
fine-tune a text-to-video model to capture the object's shape and attributes
accurately. Our approach presents subject region loss and video preservation
loss to enhance the subject's learning performance, along with a subject token
cross-attention loss to integrate the customized subject with motion control
signals. Additionally, we propose training-free techniques for managing subject
and camera motions during inference. In particular, we utilize cross-attention
map manipulation to govern subject motion and introduce a novel latent shift
module for camera movement control as well. MotionBooth excels in preserving
the appearance of subjects while simultaneously controlling the motions in
generated videos. Extensive quantitative and qualitative evaluations
demonstrate the superiority and effectiveness of our method. Our project page
is at https://jianzongwu.github.io/projects/motionboothSummary
AI-Generated Summary