基於單一圖像對的文本到圖像模型定製化（注：此處採用學術界常用譯法，"Customizing"譯為"定製化"更符合技術文脈，"Text-to-Image"統一譯為"文本到圖像"，"Image Pair"採用"圖像對"這一專業表述）

摘要

藝術重詮釋是指以參考作品為基礎創造變體，形成具有獨特藝術風格的配對藝術作品。我們探討能否利用這類圖像對來定制生成模型，使其掌握所展現的風格差異。本文提出「配對定制法」——一種能從單一圖像對學習風格差異，並將獲取風格應用於生成過程的新定制方法。有別於現有從圖像集合中模仿單一概念的方法，本技術能捕捉配對圖像間的風格差異，使我們在應用風格變化時避免對示例中特定圖像內容的過度擬合。為解決此新任務，我們採用聯合優化方法，將風格與內容明確分離至不同的LoRA權重空間。通過優化這些風格與內容權重，在重現風格圖與內容圖的同時促進其正交性。在推理階段，我們基於學習所得權重，透過新建的風格引導機制調整擴散過程。定性與定量實驗均表明，本方法能有效學習風格並避免對圖像內容的過度擬合，彰顯了從單一圖像對建模此類風格差異的潛力。

English

Art reinterpretation is the practice of creating a variation of a reference work, making a paired artwork that exhibits a distinct artistic style. We ask if such an image pair can be used to customize a generative model to capture the demonstrated stylistic difference. We propose Pair Customization, a new customization method that learns stylistic difference from a single image pair and then applies the acquired style to the generation process. Unlike existing methods that learn to mimic a single concept from a collection of images, our method captures the stylistic difference between paired images. This allows us to apply a stylistic change without overfitting to the specific image content in the examples. To address this new task, we employ a joint optimization method that explicitly separates the style and content into distinct LoRA weight spaces. We optimize these style and content weights to reproduce the style and content images while encouraging their orthogonality. During inference, we modify the diffusion process via a new style guidance based on our learned weights. Both qualitative and quantitative experiments show that our method can effectively learn style while avoiding overfitting to image content, highlighting the potential of modeling such stylistic differences from a single image pair.

基於單一圖像對的文本到圖像模型定製化（注：此處採用學術界常用譯法，"Customizing"譯為"定製化"更符合技術文脈，"Text-to-Image"統一譯為"文本到圖像"，"Image Pair"採用"圖像對"這一專業表述）

Customizing Text-to-Image Models with a Single Image Pair

摘要

Support