通過名人基底在擴散模型中插入任何人

摘要

存在對於定製預訓練大型文本到圖像模型（例如穩定擴散）的需求，以生成創新概念，例如用戶本身。然而，從先前的定製方法中新增的概念在訓練期間通常表現出比原始概念更弱的組合能力，即使提供了多幅圖像。因此，我們提出了一種新的個性化方法，允許將獨特個人無縫地整合到預先訓練的擴散模型中，僅需一張面部照片和僅 1024 個可學習參數，在 3 分鐘內完成。這樣，我們就可以輕鬆生成這個人在任何姿勢或位置中與任何人互動，並根據文本提示進行各種想像的活動。為了實現這一目標，我們首先從預先訓練的大型文本編碼器的嵌入空間中分析並構建了一個明確的名人基礎。然後，給定一張面部照片作為目標身份，我們通過優化該基礎的權重並鎖定所有其他參數來生成其自己的嵌入。在我們定製的模型中，由於所提出的名人基礎，新的身份展示出比先前個性化方法更好的概念組合能力。此外，我們的模型還可以同時學習多個新身份，並在先前的定製模型失敗時彼此互動。程式碼將被釋出。

English

Exquisite demand exists for customizing the pretrained large text-to-image model, e.g., Stable Diffusion, to generate innovative concepts, such as the users themselves. However, the newly-added concept from previous customization methods often shows weaker combination abilities than the original ones even given several images during training. We thus propose a new personalization method that allows for the seamless integration of a unique individual into the pre-trained diffusion model using just one facial photograph and only 1024 learnable parameters under 3 minutes. So as we can effortlessly generate stunning images of this person in any pose or position, interacting with anyone and doing anything imaginable from text prompts. To achieve this, we first analyze and build a well-defined celeb basis from the embedding space of the pre-trained large text encoder. Then, given one facial photo as the target identity, we generate its own embedding by optimizing the weight of this basis and locking all other parameters. Empowered by the proposed celeb basis, the new identity in our customized model showcases a better concept combination ability than previous personalization methods. Besides, our model can also learn several new identities at once and interact with each other where the previous customization model fails to. The code will be released.

通過名人基底在擴散模型中插入任何人

Inserting Anybody in Diffusion Models via Celeb Basis

摘要

Support