MotionBooth:運動感知定制文本到視頻生成
MotionBooth: Motion-Aware Customized Text-to-Video Generation
June 25, 2024
作者: Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen
cs.AI
摘要
在這份工作中,我們提出了MotionBooth,一個創新的框架,旨在以精確控制物體和攝影機運動的方式為定制主題添加動畫效果。通過利用特定物體的少量圖像,我們有效地微調文本到視頻模型,以準確捕捉物體的形狀和屬性。我們的方法提出了主題區域損失和視頻保存損失,以增強主題的學習性能,還引入了主題標記交叉注意力損失,以將定制主題與運動控制信號整合。此外,我們提出了無需訓練的技術,用於在推論期間管理主題和攝影機運動。特別地,我們利用交叉注意力地圖操作來控制主題運動,並引入了一個新穎的潛在位移模塊,用於攝影機運動控制。MotionBooth在保留主題外觀的同時,同時控制生成視頻中的運動方面表現出色。廣泛的定量和定性評估證明了我們方法的優越性和有效性。我們的項目頁面位於https://jianzongwu.github.io/projects/motionbooth。
English
In this work, we present MotionBooth, an innovative framework designed for
animating customized subjects with precise control over both object and camera
movements. By leveraging a few images of a specific object, we efficiently
fine-tune a text-to-video model to capture the object's shape and attributes
accurately. Our approach presents subject region loss and video preservation
loss to enhance the subject's learning performance, along with a subject token
cross-attention loss to integrate the customized subject with motion control
signals. Additionally, we propose training-free techniques for managing subject
and camera motions during inference. In particular, we utilize cross-attention
map manipulation to govern subject motion and introduce a novel latent shift
module for camera movement control as well. MotionBooth excels in preserving
the appearance of subjects while simultaneously controlling the motions in
generated videos. Extensive quantitative and qualitative evaluations
demonstrate the superiority and effectiveness of our method. Our project page
is at https://jianzongwu.github.io/projects/motionboothSummary
AI-Generated Summary