基於單一圖像對的文本到圖像模型定製化 (注:此處採用學術界常用譯法,"Customizing"譯為"定製化"更符合技術文脈,"Text-to-Image"統一譯為"文本到圖像","Image Pair"採用"圖像對"這一專業表述)
Customizing Text-to-Image Models with a Single Image Pair
May 2, 2024
作者: Maxwell Jones, Sheng-Yu Wang, Nupur Kumari, David Bau, Jun-Yan Zhu
cs.AI
摘要
藝術重詮釋是指以參考作品為基礎創造變體,形成具有獨特藝術風格的配對藝術作品。我們探討能否利用這類圖像對來定制生成模型,使其掌握所展現的風格差異。本文提出「配對定制法」——一種能從單一圖像對學習風格差異,並將獲取風格應用於生成過程的新定制方法。有別於現有從圖像集合中模仿單一概念的方法,本技術能捕捉配對圖像間的風格差異,使我們在應用風格變化時避免對示例中特定圖像內容的過度擬合。為解決此新任務,我們採用聯合優化方法,將風格與內容明確分離至不同的LoRA權重空間。通過優化這些風格與內容權重,在重現風格圖與內容圖的同時促進其正交性。在推理階段,我們基於學習所得權重,透過新建的風格引導機制調整擴散過程。定性與定量實驗均表明,本方法能有效學習風格並避免對圖像內容的過度擬合,彰顯了從單一圖像對建模此類風格差異的潛力。
English
Art reinterpretation is the practice of creating a variation of a reference
work, making a paired artwork that exhibits a distinct artistic style. We ask
if such an image pair can be used to customize a generative model to capture
the demonstrated stylistic difference. We propose Pair Customization, a new
customization method that learns stylistic difference from a single image pair
and then applies the acquired style to the generation process. Unlike existing
methods that learn to mimic a single concept from a collection of images, our
method captures the stylistic difference between paired images. This allows us
to apply a stylistic change without overfitting to the specific image content
in the examples. To address this new task, we employ a joint optimization
method that explicitly separates the style and content into distinct LoRA
weight spaces. We optimize these style and content weights to reproduce the
style and content images while encouraging their orthogonality. During
inference, we modify the diffusion process via a new style guidance based on
our learned weights. Both qualitative and quantitative experiments show that
our method can effectively learn style while avoiding overfitting to image
content, highlighting the potential of modeling such stylistic differences from
a single image pair.