基於概念驅動的文本到圖像生成的個性化殘差

摘要

我們提出了個性化殘差和區域關注引導取樣的方法，以實現使用文本到圖像擴散模型進行高效概念驅動生成。我們的方法首先通過凍結預訓練文本條件擴散模型的權重來表示概念，並學習一個小子集模型層的低秩殘差。基於殘差的方法直接實現了我們提出的取樣技術應用，該技術僅在通過交叉關注定位概念的區域應用學習的殘差，並在所有其他區域應用原始擴散權重。因此，區域取樣結合了概念的學習身份和底層擴散模型現有的生成先驗。我們展示了個性化殘差在單個 GPU 上在約 3 分鐘內有效捕獲概念的身份，而無需使用正則化圖像，並且比先前模型具有更少的參數，區域取樣允許在圖像的大部分區域使用原始模型作為強先驗。

English

We present personalized residuals and localized attention-guided sampling for efficient concept-driven generation using text-to-image diffusion models. Our method first represents concepts by freezing the weights of a pretrained text-conditioned diffusion model and learning low-rank residuals for a small subset of the model's layers. The residual-based approach then directly enables application of our proposed sampling technique, which applies the learned residuals only in areas where the concept is localized via cross-attention and applies the original diffusion weights in all other regions. Localized sampling therefore combines the learned identity of the concept with the existing generative prior of the underlying diffusion model. We show that personalized residuals effectively capture the identity of a concept in ~3 minutes on a single GPU without the use of regularization images and with fewer parameters than previous models, and localized sampling allows using the original model as strong prior for large parts of the image.

基於概念驅動的文本到圖像生成的個性化殘差

Personalized Residuals for Concept-Driven Text-to-Image Generation

摘要

Support