基於物件驅動的一次微調，使用原型嵌入技術對文本到圖像擴散進行微調。

摘要

隨著大規模文本到圖像生成模型在文本到圖像生成領域取得顯著進展，許多微調方法已被提出。然而，這些模型常常在面對新物體時遇到困難，特別是在單次情境下。我們提出的方法旨在以物件為驅動方式解決泛化能力和忠實度方面的挑戰，僅使用單張輸入圖像和物件特定的感興趣區域。為了提高泛化能力並減輕過度擬合，在我們的範式中，會基於物件的外觀和類別初始化原型嵌入，然後微調擴散模型。在微調過程中，我們提出了一種類別特徵正則化方法，以保留對物件類別的先前知識。為了進一步提高忠實度，我們引入了物件特定損失，也可用於植入多個物件。總的來說，我們提出的以物件為驅動的新物件植入方法可以與現有概念無縫整合，同時具有高忠實度和泛化能力。我們的方法優於幾項現有作品。程式碼將會釋出。

English

As large-scale text-to-image generation models have made remarkable progress in the field of text-to-image generation, many fine-tuning methods have been proposed. However, these models often struggle with novel objects, especially with one-shot scenarios. Our proposed method aims to address the challenges of generalizability and fidelity in an object-driven way, using only a single input image and the object-specific regions of interest. To improve generalizability and mitigate overfitting, in our paradigm, a prototypical embedding is initialized based on the object's appearance and its class, before fine-tuning the diffusion model. And during fine-tuning, we propose a class-characterizing regularization to preserve prior knowledge of object classes. To further improve fidelity, we introduce object-specific loss, which can also use to implant multiple objects. Overall, our proposed object-driven method for implanting new objects can integrate seamlessly with existing concepts as well as with high fidelity and generalization. Our method outperforms several existing works. The code will be released.

基於物件驅動的一次微調，使用原型嵌入技術對文本到圖像擴散進行微調。

Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding

摘要

Support