通过名人基础在扩散模型中插入任何人
Inserting Anybody in Diffusion Models via Celeb Basis
June 1, 2023
作者: Ge Yuan, Xiaodong Cun, Yong Zhang, Maomao Li, Chenyang Qi, Xintao Wang, Ying Shan, Huicheng Zheng
cs.AI
摘要
存在对定制预训练大型文本到图像模型(如稳定扩散)进行个性化需求,以生成创新概念,比如用户本身。然而,先前定制方法中新增的概念在训练过程中通常显示出比原始概念更弱的组合能力,即使在给定多幅图像的情况下也是如此。因此,我们提出了一种新的个性化方法,允许将独特个体无缝集成到预训练扩散模型中,仅需一张面部照片和仅 1024 个可学习参数,在 3 分钟内完成。因此,我们可以轻松生成这个人的惊艳图像,无论是在任何姿势或位置,与任何人互动,从文本提示中想象的任何事情。为实现这一目标,我们首先分析并构建了一个明确定义的名人基础,从预训练大型文本编码器的嵌入空间中。然后,给定一个面部照片作为目标身份,我们通过优化这个基础的权重并锁定所有其他参数来生成其自己的嵌入。在新的定制模型中,由于提出的名人基础,新身份展示出比先前个性化方法更好的概念组合能力。此外,我们的模型还可以同时学习多个新身份,并在先前的定制模型失败时彼此互动。代码将会发布。
English
Exquisite demand exists for customizing the pretrained large text-to-image
model, e.g., Stable Diffusion, to generate innovative concepts, such
as the users themselves. However, the newly-added concept from previous
customization methods often shows weaker combination abilities than the
original ones even given several images during training. We thus propose a new
personalization method that allows for the seamless integration of a unique
individual into the pre-trained diffusion model using just one facial
photograph and only 1024 learnable parameters under 3
minutes. So as we can effortlessly generate stunning images of this person in
any pose or position, interacting with anyone and doing anything imaginable
from text prompts. To achieve this, we first analyze and build a well-defined
celeb basis from the embedding space of the pre-trained large text encoder.
Then, given one facial photo as the target identity, we generate its own
embedding by optimizing the weight of this basis and locking all other
parameters. Empowered by the proposed celeb basis, the new identity in our
customized model showcases a better concept combination ability than previous
personalization methods. Besides, our model can also learn several new
identities at once and interact with each other where the previous
customization model fails to. The code will be released.