ChatPaper.aiChatPaper

动态展台:动态感知定制文本到视频生成

MotionBooth: Motion-Aware Customized Text-to-Video Generation

June 25, 2024
作者: Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen
cs.AI

摘要

在这项工作中,我们提出了MotionBooth,这是一个创新性框架,旨在为定制主题的动画提供精确控制,涵盖对象和摄像机运动。通过利用特定对象的少量图像,我们有效地微调文本到视频模型,以准确捕捉对象的形状和属性。我们的方法提出了主题区域损失和视频保留损失,以增强主题的学习性能,同时引入主题标记交叉注意力损失,将定制主题与运动控制信号整合。此外,我们提出了训练无关的技术,用于在推断期间管理主题和摄像机运动。具体而言,我们利用交叉注意力地图操作来控制主题运动,并引入了一种新颖的潜在偏移模块,用于摄像机运动控制。MotionBooth在保留主题外观的同时,同时控制生成视频中的运动方面表现出色。广泛的定量和定性评估证明了我们方法的优越性和有效性。我们的项目页面位于https://jianzongwu.github.io/projects/motionbooth。
English
In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately. Our approach presents subject region loss and video preservation loss to enhance the subject's learning performance, along with a subject token cross-attention loss to integrate the customized subject with motion control signals. Additionally, we propose training-free techniques for managing subject and camera motions during inference. In particular, we utilize cross-attention map manipulation to govern subject motion and introduce a novel latent shift module for camera movement control as well. MotionBooth excels in preserving the appearance of subjects while simultaneously controlling the motions in generated videos. Extensive quantitative and qualitative evaluations demonstrate the superiority and effectiveness of our method. Our project page is at https://jianzongwu.github.io/projects/motionbooth

Summary

AI-Generated Summary

PDF191November 29, 2024