ChatPaper.aiChatPaper

MotionBooth:運動感知定制文本到視頻生成

MotionBooth: Motion-Aware Customized Text-to-Video Generation

June 25, 2024
作者: Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen
cs.AI

摘要

在這份工作中,我們提出了MotionBooth,一個創新的框架,旨在以精確控制物體和攝影機運動的方式為定制主題添加動畫效果。通過利用特定物體的少量圖像,我們有效地微調文本到視頻模型,以準確捕捉物體的形狀和屬性。我們的方法提出了主題區域損失和視頻保存損失,以增強主題的學習性能,還引入了主題標記交叉注意力損失,以將定制主題與運動控制信號整合。此外,我們提出了無需訓練的技術,用於在推論期間管理主題和攝影機運動。特別地,我們利用交叉注意力地圖操作來控制主題運動,並引入了一個新穎的潛在位移模塊,用於攝影機運動控制。MotionBooth在保留主題外觀的同時,同時控制生成視頻中的運動方面表現出色。廣泛的定量和定性評估證明了我們方法的優越性和有效性。我們的項目頁面位於https://jianzongwu.github.io/projects/motionbooth。
English
In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately. Our approach presents subject region loss and video preservation loss to enhance the subject's learning performance, along with a subject token cross-attention loss to integrate the customized subject with motion control signals. Additionally, we propose training-free techniques for managing subject and camera motions during inference. In particular, we utilize cross-attention map manipulation to govern subject motion and introduce a novel latent shift module for camera movement control as well. MotionBooth excels in preserving the appearance of subjects while simultaneously controlling the motions in generated videos. Extensive quantitative and qualitative evaluations demonstrate the superiority and effectiveness of our method. Our project page is at https://jianzongwu.github.io/projects/motionbooth

Summary

AI-Generated Summary

PDF191November 29, 2024