利用单个图像对定制文本到图像模型

摘要

艺术重新诠释是创作参考作品的变体的实践，制作展示独特艺术风格的配对艺术品。我们探讨这样的图像对是否可以用于定制生成模型，以捕捉展示的风格差异。我们提出了一种新的定制方法——配对定制，该方法从单个图像对中学习风格差异，然后将获得的风格应用于生成过程。与现有方法不同，现有方法是从图像集合中学习模仿单个概念，我们的方法捕捉了配对图像之间的风格差异。这使我们能够应用风格变化，而无需过度拟合到示例中的特定图像内容。为了解决这一新任务，我们采用了一种联合优化方法，明确将风格和内容分开到不同的 LoRA 权重空间中。我们优化这些风格和内容权重，以重现风格和内容图像，并鼓励它们的正交性。在推断过程中，我们通过基于学习权重的新风格指导修改扩散过程。定性和定量实验表明，我们的方法能够有效学习风格，同时避免过度拟合图像内容，突显了从单个图像对中建模这种风格差异的潜力。

English

Art reinterpretation is the practice of creating a variation of a reference work, making a paired artwork that exhibits a distinct artistic style. We ask if such an image pair can be used to customize a generative model to capture the demonstrated stylistic difference. We propose Pair Customization, a new customization method that learns stylistic difference from a single image pair and then applies the acquired style to the generation process. Unlike existing methods that learn to mimic a single concept from a collection of images, our method captures the stylistic difference between paired images. This allows us to apply a stylistic change without overfitting to the specific image content in the examples. To address this new task, we employ a joint optimization method that explicitly separates the style and content into distinct LoRA weight spaces. We optimize these style and content weights to reproduce the style and content images while encouraging their orthogonality. During inference, we modify the diffusion process via a new style guidance based on our learned weights. Both qualitative and quantitative experiments show that our method can effectively learn style while avoiding overfitting to image content, highlighting the potential of modeling such stylistic differences from a single image pair.

利用单个图像对定制文本到图像模型

Customizing Text-to-Image Models with a Single Image Pair

摘要

Support