사용자 정의 확산 모델의 가중치 공간 해석하기

초록

우리는 다양한 맞춤형 확산 모델들이 형성하는 가중치 공간을 탐구한다. 이를 위해 60,000개 이상의 모델로 구성된 데이터셋을 구축했으며, 각 모델은 기본 모델을 미세 조정하여 서로 다른 개인의 시각적 정체성을 반영하도록 설계되었다. 우리는 이러한 가중치들의 근본적인 다양체를 하나의 부분공간으로 모델링하고, 이를 'weights2weights'라고 명명한다. 이 공간의 세 가지 즉각적인 응용 사례를 보여준다: 샘플링, 편집, 그리고 역변환. 첫째, 이 공간의 각 점은 하나의 정체성에 대응되며, 이 공간에서 가중치 집합을 샘플링하면 새로운 정체성을 인코딩한 모델을 얻을 수 있다. 둘째, 이 공간에서 선형 방향을 찾아 정체성의 의미론적 편집(예: 수염 추가)을 수행할 수 있으며, 이러한 편집은 생성된 샘플들에서 일관된 외모로 유지된다. 마지막으로, 단일 이미지를 이 공간으로 역변환하면 입력 이미지가 분포를 벗어난 경우(예: 그림)에도 현실적인 정체성을 재구성할 수 있음을 보여준다. 우리의 결과는 미세 조정된 확산 모델의 가중치 공간이 정체성의 해석 가능한 잠재 공간으로 작동함을 시사한다.

English

We investigate the space of weights spanned by a large collection of customized diffusion models. We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person's visual identity. We model the underlying manifold of these weights as a subspace, which we term weights2weights. We demonstrate three immediate applications of this space -- sampling, editing, and inversion. First, as each point in the space corresponds to an identity, sampling a set of weights from it results in a model encoding a novel identity. Next, we find linear directions in this space corresponding to semantic edits of the identity (e.g., adding a beard). These edits persist in appearance across generated samples. Finally, we show that inverting a single image into this space reconstructs a realistic identity, even if the input image is out of distribution (e.g., a painting). Our results indicate that the weight space of fine-tuned diffusion models behaves as an interpretable latent space of identities.

사용자 정의 확산 모델의 가중치 공간 해석하기

Interpreting the Weight Space of Customized Diffusion Models

초록

Support