基于概念驱动的文本到图像生成的个性化残差

摘要

我们提出了个性化残差和局部注意力引导采样，用于使用文本到图像扩散模型进行高效的概念驱动生成。我们的方法首先通过冻结预训练文本条件扩散模型的权重来表示概念，并为模型的一个小子集学习低秩残差。基于残差的方法直接实现了我们提出的采样技术的应用，该技术通过交叉注意力仅在概念局部化的区域应用学习到的残差，并在所有其他区域应用原始扩散权重。因此，局部化采样将概念的学习身份与基础扩散模型的现有生成先验相结合。我们展示了个性化残差在单个GPU上在约3分钟内有效捕获概念的身份，无需使用正则化图像，并且比先前模型具有更少的参数，而局部化采样允许在图像的大部分区域使用原始模型作为强先验。

English

We present personalized residuals and localized attention-guided sampling for efficient concept-driven generation using text-to-image diffusion models. Our method first represents concepts by freezing the weights of a pretrained text-conditioned diffusion model and learning low-rank residuals for a small subset of the model's layers. The residual-based approach then directly enables application of our proposed sampling technique, which applies the learned residuals only in areas where the concept is localized via cross-attention and applies the original diffusion weights in all other regions. Localized sampling therefore combines the learned identity of the concept with the existing generative prior of the underlying diffusion model. We show that personalized residuals effectively capture the identity of a concept in ~3 minutes on a single GPU without the use of regularization images and with fewer parameters than previous models, and localized sampling allows using the original model as strong prior for large parts of the image.

基于概念驱动的文本到图像生成的个性化残差

Personalized Residuals for Concept-Driven Text-to-Image Generation

摘要

Support