Magic-Me:身份特定影片定制擴散
Magic-Me: Identity-Specific Video Customized Diffusion
February 14, 2024
作者: Ze Ma, Daquan Zhou, Chun-Hsiao Yeh, Xue-She Wang, Xiuyu Li, Huanrui Yang, Zhen Dong, Kurt Keutzer, Jiashi Feng
cs.AI
摘要
在生成模型領域中,為特定身份識別(ID)創建內容已經引起了相當大的興趣。在文本到圖像生成(T2I)領域中,以主題驅動的內容生成已經取得了巨大的進展,使圖像中的ID可控。然而,將其擴展到視頻生成尚未得到很好的探索。在這項工作中,我們提出了一個簡單而有效的主題身份可控視頻生成框架,稱為視頻自定擴散(VCD)。通過幾張圖像定義的特定主題ID,VCD加強了身份信息的提取,並在初始化階段注入了逐幀相關性,以實現穩定的視頻輸出,並在很大程度上保留了身份。為了實現這一目標,我們提出了三個對於高質量ID保留至關重要的新組件:1)通過提示到分割訓練的ID模塊,以解開ID信息和背景噪聲,以便更準確地學習ID標記;2)具有3D高斯噪聲先驗的文本到視頻(T2V)VCD模塊,以獲得更好的幀間一致性;3)視頻到視頻(V2V)臉部VCD和平鋪VCD模塊,用於去模糊臉部並提高視頻的分辨率。
儘管其簡單性,我們進行了大量實驗,驗證VCD能夠生成穩定且高質量的視頻,並具有比選定的強基線更好的ID。此外,由於ID模塊的可轉移性,VCD還可以與公開可用的微調文本到圖像模型很好地配合,進一步提高了其可用性。代碼可在https://github.com/Zhen-Dong/Magic-Me 找到。
English
Creating content for a specific identity (ID) has shown significant interest
in the field of generative models. In the field of text-to-image generation
(T2I), subject-driven content generation has achieved great progress with the
ID in the images controllable. However, extending it to video generation is not
well explored. In this work, we propose a simple yet effective subject identity
controllable video generation framework, termed Video Custom Diffusion (VCD).
With a specified subject ID defined by a few images, VCD reinforces the
identity information extraction and injects frame-wise correlation at the
initialization stage for stable video outputs with identity preserved to a
large extent. To achieve this, we propose three novel components that are
essential for high-quality ID preservation: 1) an ID module trained with the
cropped identity by prompt-to-segmentation to disentangle the ID information
and the background noise for more accurate ID token learning; 2) a
text-to-video (T2V) VCD module with 3D Gaussian Noise Prior for better
inter-frame consistency and 3) video-to-video (V2V) Face VCD and Tiled VCD
modules to deblur the face and upscale the video for higher resolution.
Despite its simplicity, we conducted extensive experiments to verify that VCD
is able to generate stable and high-quality videos with better ID over the
selected strong baselines. Besides, due to the transferability of the ID
module, VCD is also working well with finetuned text-to-image models available
publically, further improving its usability. The codes are available at
https://github.com/Zhen-Dong/Magic-Me.Summary
AI-Generated Summary