개념 기반 텍스트-이미지 생성을 위한 개인화 잔차

초록

텍스트-이미지 확산 모델을 활용한 효율적인 개념 기반 생성을 위해 개인화된 잔차와 지역화된 주의 기반 샘플링 기법을 제안한다. 우리의 방법은 먼저 사전 학습된 텍스트 조건부 확산 모델의 가중치를 고정하고, 모델의 일부 계층에 대해 저차원 잔차를 학습함으로써 개념을 표현한다. 이 잔차 기반 접근법은 제안된 샘플링 기법의 직접적인 적용을 가능하게 하는데, 이 기법은 학습된 잔차를 교차 주의를 통해 개념이 지역화된 영역에만 적용하고, 나머지 영역에서는 원본 확산 모델의 가중치를 사용한다. 따라서 지역화된 샘플링은 학습된 개념의 정체성을 기저 확산 모델의 기존 생성 사전 지식과 결합한다. 우리는 개인화된 잔차가 단일 GPU에서 약 3분 만에 정규화 이미지 없이도 개념의 정체성을 효과적으로 포착하며, 이전 모델보다 적은 매개변수로 이를 달성할 수 있음을 보여준다. 또한 지역화된 샘플링은 이미지의 대부분 영역에 대해 원본 모델을 강력한 사전 지식으로 활용할 수 있게 한다.

English

We present personalized residuals and localized attention-guided sampling for efficient concept-driven generation using text-to-image diffusion models. Our method first represents concepts by freezing the weights of a pretrained text-conditioned diffusion model and learning low-rank residuals for a small subset of the model's layers. The residual-based approach then directly enables application of our proposed sampling technique, which applies the learned residuals only in areas where the concept is localized via cross-attention and applies the original diffusion weights in all other regions. Localized sampling therefore combines the learned identity of the concept with the existing generative prior of the underlying diffusion model. We show that personalized residuals effectively capture the identity of a concept in ~3 minutes on a single GPU without the use of regularization images and with fewer parameters than previous models, and localized sampling allows using the original model as strong prior for large parts of the image.

개념 기반 텍스트-이미지 생성을 위한 개인화 잔차

Personalized Residuals for Concept-Driven Text-to-Image Generation

초록

Support