ChatPaper.aiChatPaper

基于单张图像对定制文本到图像模型 (注:根据AI领域术语使用习惯,"Image Pair"在此语境下更适合译为"图像对"而非"图像配对",因后者多指匹配过程而非数据单元。标题采用动态对等译法,既保留"Pair"的复数含义,又通过"单张"明确数量限制,符合中文技术文献标题简洁性要求。)

Customizing Text-to-Image Models with a Single Image Pair

May 2, 2024
作者: Maxwell Jones, Sheng-Yu Wang, Nupur Kumari, David Bau, Jun-Yan Zhu
cs.AI

摘要

艺术重诠释是指基于参照作品创作变体,生成具有独特艺术风格的配对艺术作品。我们探讨能否利用此类图像对来定制生成模型,以捕捉其中展现的风格差异。本文提出配对定制方法——一种从单对图像中学习风格差异并将其应用于生成过程的新颖定制技术。与现有方法从图像集合中学习模仿单一概念不同,我们的方法能捕捉配对图像间的风格差异,从而在应用风格变化时避免对示例中具体图像内容的过拟合。针对这一新任务,我们采用联合优化方法,将风格与内容显式分离至不同的LoRA权重空间。通过优化风格权重与内容权重,在保持二者正交性的同时重现风格图像与内容图像。在推理阶段,我们基于习得的权重通过新型风格引导机制调整扩散过程。定性与定量实验表明,本方法能有效学习风格特征并避免对图像内容的过拟合,彰显了从单对图像中建模风格差异的潜力。
English
Art reinterpretation is the practice of creating a variation of a reference work, making a paired artwork that exhibits a distinct artistic style. We ask if such an image pair can be used to customize a generative model to capture the demonstrated stylistic difference. We propose Pair Customization, a new customization method that learns stylistic difference from a single image pair and then applies the acquired style to the generation process. Unlike existing methods that learn to mimic a single concept from a collection of images, our method captures the stylistic difference between paired images. This allows us to apply a stylistic change without overfitting to the specific image content in the examples. To address this new task, we employ a joint optimization method that explicitly separates the style and content into distinct LoRA weight spaces. We optimize these style and content weights to reproduce the style and content images while encouraging their orthogonality. During inference, we modify the diffusion process via a new style guidance based on our learned weights. Both qualitative and quantitative experiments show that our method can effectively learn style while avoiding overfitting to image content, highlighting the potential of modeling such stylistic differences from a single image pair.
PDF221February 8, 2026