基于对象的一次微调,使用原型嵌入对文本到图像扩散进行微调
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding
January 28, 2024
作者: Jianxiang Lu, Cong Xie, Hui Guo
cs.AI
摘要
随着大规模文本到图像生成模型在文本到图像生成领域取得显著进展,许多微调方法已被提出。然而,这些模型通常在处理新颖对象时遇到困难,特别是在一次性场景下。我们提出的方法旨在以面向对象的方式解决泛化能力和保真度方面的挑战,仅利用单个输入图像和特定对象区域。为了提高泛化能力并减轻过拟合,在我们的范式中,基于对象的外观和类别初始化了一个原型嵌入,然后对扩散模型进行微调。在微调过程中,我们提出了一种类别特征正则化方法,以保留对象类别的先验知识。为了进一步提高保真度,我们引入了特定对象的损失,也可用于植入多个对象。总体而言,我们提出的面向对象的新对象植入方法可以与现有概念以及高保真度和泛化性能无缝集成。我们的方法优于几种现有作品。代码将会发布。
English
As large-scale text-to-image generation models have made remarkable progress
in the field of text-to-image generation, many fine-tuning methods have been
proposed. However, these models often struggle with novel objects, especially
with one-shot scenarios. Our proposed method aims to address the challenges of
generalizability and fidelity in an object-driven way, using only a single
input image and the object-specific regions of interest. To improve
generalizability and mitigate overfitting, in our paradigm, a prototypical
embedding is initialized based on the object's appearance and its class, before
fine-tuning the diffusion model. And during fine-tuning, we propose a
class-characterizing regularization to preserve prior knowledge of object
classes. To further improve fidelity, we introduce object-specific loss, which
can also use to implant multiple objects. Overall, our proposed object-driven
method for implanting new objects can integrate seamlessly with existing
concepts as well as with high fidelity and generalization. Our method
outperforms several existing works. The code will be released.