ChatPaper.aiChatPaper

概念驅動文字轉圖像生成中的個性化殘差

Personalized Residuals for Concept-Driven Text-to-Image Generation

May 21, 2024
作者: Cusuh Ham, Matthew Fisher, James Hays, Nicholas Kolkin, Yuchen Liu, Richard Zhang, Tobias Hinz
cs.AI

摘要

我們提出個性化殘差與局部化注意力引導採樣技術,用於實現基於文字到圖像擴散模型的高效概念驅動生成。我們的方法首先通過凍結預訓練文字條件擴散模型的權重,並針對模型層的小型子集學習低秩殘差來表徵概念。這種基於殘差的方法直接支援我們提出的採樣技術應用——僅在透過交叉注意力定位的概念區域施加學習得到的殘差,並在所有其他區域保持原始擴散權重。局部化採樣因此能將學習到的概念特徵與底層擴散模型的既有生成先驗相結合。實驗表明,個性化殘差可在單一GPU上約3分鐘內有效捕捉概念特徵,無需使用正則化圖像且參數量少於先前模型,而局部化採樣則允許在圖像大部分區域沿用原始模型的強力先驗。
English
We present personalized residuals and localized attention-guided sampling for efficient concept-driven generation using text-to-image diffusion models. Our method first represents concepts by freezing the weights of a pretrained text-conditioned diffusion model and learning low-rank residuals for a small subset of the model's layers. The residual-based approach then directly enables application of our proposed sampling technique, which applies the learned residuals only in areas where the concept is localized via cross-attention and applies the original diffusion weights in all other regions. Localized sampling therefore combines the learned identity of the concept with the existing generative prior of the underlying diffusion model. We show that personalized residuals effectively capture the identity of a concept in ~3 minutes on a single GPU without the use of regularization images and with fewer parameters than previous models, and localized sampling allows using the original model as strong prior for large parts of the image.
PDF122February 8, 2026