基于对象的一次微调，使用原型嵌入对文本到图像扩散进行微调

摘要

随着大规模文本到图像生成模型在文本到图像生成领域取得显著进展，许多微调方法已被提出。然而，这些模型通常在处理新颖对象时遇到困难，特别是在一次性场景下。我们提出的方法旨在以面向对象的方式解决泛化能力和保真度方面的挑战，仅利用单个输入图像和特定对象区域。为了提高泛化能力并减轻过拟合，在我们的范式中，基于对象的外观和类别初始化了一个原型嵌入，然后对扩散模型进行微调。在微调过程中，我们提出了一种类别特征正则化方法，以保留对象类别的先验知识。为了进一步提高保真度，我们引入了特定对象的损失，也可用于植入多个对象。总体而言，我们提出的面向对象的新对象植入方法可以与现有概念以及高保真度和泛化性能无缝集成。我们的方法优于几种现有作品。代码将会发布。

English

As large-scale text-to-image generation models have made remarkable progress in the field of text-to-image generation, many fine-tuning methods have been proposed. However, these models often struggle with novel objects, especially with one-shot scenarios. Our proposed method aims to address the challenges of generalizability and fidelity in an object-driven way, using only a single input image and the object-specific regions of interest. To improve generalizability and mitigate overfitting, in our paradigm, a prototypical embedding is initialized based on the object's appearance and its class, before fine-tuning the diffusion model. And during fine-tuning, we propose a class-characterizing regularization to preserve prior knowledge of object classes. To further improve fidelity, we introduce object-specific loss, which can also use to implant multiple objects. Overall, our proposed object-driven method for implanting new objects can integrate seamlessly with existing concepts as well as with high fidelity and generalization. Our method outperforms several existing works. The code will be released.

基于对象的一次微调，使用原型嵌入对文本到图像扩散进行微调

Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding

摘要

Support