利用单个图像对定制文本到图像模型
Customizing Text-to-Image Models with a Single Image Pair
May 2, 2024
作者: Maxwell Jones, Sheng-Yu Wang, Nupur Kumari, David Bau, Jun-Yan Zhu
cs.AI
摘要
艺术重新诠释是创作参考作品的变体的实践,制作展示独特艺术风格的配对艺术品。我们探讨这样的图像对是否可以用于定制生成模型,以捕捉展示的风格差异。我们提出了一种新的定制方法——配对定制,该方法从单个图像对中学习风格差异,然后将获得的风格应用于生成过程。与现有方法不同,现有方法是从图像集合中学习模仿单个概念,我们的方法捕捉了配对图像之间的风格差异。这使我们能够应用风格变化,而无需过度拟合到示例中的特定图像内容。为了解决这一新任务,我们采用了一种联合优化方法,明确将风格和内容分开到不同的 LoRA 权重空间中。我们优化这些风格和内容权重,以重现风格和内容图像,并鼓励它们的正交性。在推断过程中,我们通过基于学习权重的新风格指导修改扩散过程。定性和定量实验表明,我们的方法能够有效学习风格,同时避免过度拟合图像内容,突显了从单个图像对中建模这种风格差异的潜力。
English
Art reinterpretation is the practice of creating a variation of a reference
work, making a paired artwork that exhibits a distinct artistic style. We ask
if such an image pair can be used to customize a generative model to capture
the demonstrated stylistic difference. We propose Pair Customization, a new
customization method that learns stylistic difference from a single image pair
and then applies the acquired style to the generation process. Unlike existing
methods that learn to mimic a single concept from a collection of images, our
method captures the stylistic difference between paired images. This allows us
to apply a stylistic change without overfitting to the specific image content
in the examples. To address this new task, we employ a joint optimization
method that explicitly separates the style and content into distinct LoRA
weight spaces. We optimize these style and content weights to reproduce the
style and content images while encouraging their orthogonality. During
inference, we modify the diffusion process via a new style guidance based on
our learned weights. Both qualitative and quantitative experiments show that
our method can effectively learn style while avoiding overfitting to image
content, highlighting the potential of modeling such stylistic differences from
a single image pair.Summary
AI-Generated Summary