ChatPaper.aiChatPaper

魔镜:身份特定视频定制扩散

Magic-Me: Identity-Specific Video Customized Diffusion

February 14, 2024
作者: Ze Ma, Daquan Zhou, Chun-Hsiao Yeh, Xue-She Wang, Xiuyu Li, Huanrui Yang, Zhen Dong, Kurt Keutzer, Jiashi Feng
cs.AI

摘要

在生成模型领域,为特定身份(ID)创建内容已经引起了极大的兴趣。在文本到图像生成(T2I)领域,以主题驱动的内容生成取得了巨大进展,使图像中的ID可控。然而,将其扩展到视频生成领域尚未得到充分探索。在这项工作中,我们提出了一种简单而有效的主题身份可控视频生成框架,称为视频定制扩散(VCD)。通过使用少量图像定义的特定主题ID,VCD加强了身份信息提取,并在初始化阶段注入逐帧相关性,以获得具有较大程度身份保留的稳定视频输出。为实现这一目标,我们提出了三个对于高质量ID保留至关重要的新颖组件:1)通过提示到分割训练的裁剪身份的ID模块,以解开ID信息和背景噪声,实现更准确的ID标记学习;2)具有3D高斯噪声先验的文本到视频(T2V)VCD模块,以获得更好的帧间一致性;3)视频到视频(V2V)人脸VCD和平铺VCD模块,用于去模糊人脸并提升视频分辨率。 尽管其简洁性,我们进行了大量实验以验证VCD能够生成稳定且高质量的视频,并具有比所选强基线更好的ID。此外,由于ID模块的可转移性,VCD也能够与公开可用的微调文本到图像模型良好配合,进一步提高其可用性。代码可在 https://github.com/Zhen-Dong/Magic-Me 获取。
English
Creating content for a specific identity (ID) has shown significant interest in the field of generative models. In the field of text-to-image generation (T2I), subject-driven content generation has achieved great progress with the ID in the images controllable. However, extending it to video generation is not well explored. In this work, we propose a simple yet effective subject identity controllable video generation framework, termed Video Custom Diffusion (VCD). With a specified subject ID defined by a few images, VCD reinforces the identity information extraction and injects frame-wise correlation at the initialization stage for stable video outputs with identity preserved to a large extent. To achieve this, we propose three novel components that are essential for high-quality ID preservation: 1) an ID module trained with the cropped identity by prompt-to-segmentation to disentangle the ID information and the background noise for more accurate ID token learning; 2) a text-to-video (T2V) VCD module with 3D Gaussian Noise Prior for better inter-frame consistency and 3) video-to-video (V2V) Face VCD and Tiled VCD modules to deblur the face and upscale the video for higher resolution. Despite its simplicity, we conducted extensive experiments to verify that VCD is able to generate stable and high-quality videos with better ID over the selected strong baselines. Besides, due to the transferability of the ID module, VCD is also working well with finetuned text-to-image models available publically, further improving its usability. The codes are available at https://github.com/Zhen-Dong/Magic-Me.

Summary

AI-Generated Summary

PDF302December 15, 2024