解釋定制擴散模型的權重空間
Interpreting the Weight Space of Customized Diffusion Models
June 13, 2024
作者: Amil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein, Alexei A. Efros, Kfir Aberman
cs.AI
摘要
我們研究了由大量定制擴散模型所涵蓋的權重空間。我們通過創建一個包含超過60,000個模型的數據集來填充這個空間,每個模型都是一個基礎模型,經過微調以插入不同的人的視覺身份。我們將這些權重的基礟流形建模為一個子空間,我們稱之為權重對權重。我們展示了這個空間的三個即時應用——取樣、編輯和反演。首先,由於空間中的每一個點對應一個身份,從中取樣一組權重將產生編碼新身份的模型。接下來,我們在這個空間中找到對應於身份語義編輯的線性方向(例如,添加鬍子)。這些編輯在生成的樣本中保持外觀。最後,我們展示了將單張圖像反演到這個空間中,即使輸入圖像不在分佈範圍內(例如,一幅畫),也能重建出一個逼真的身份。我們的結果表明,經過微調的擴散模型的權重空間行為就像是一個可解釋的身份潛在空間。
English
We investigate the space of weights spanned by a large collection of
customized diffusion models. We populate this space by creating a dataset of
over 60,000 models, each of which is a base model fine-tuned to insert a
different person's visual identity. We model the underlying manifold of these
weights as a subspace, which we term weights2weights. We demonstrate three
immediate applications of this space -- sampling, editing, and inversion.
First, as each point in the space corresponds to an identity, sampling a set of
weights from it results in a model encoding a novel identity. Next, we find
linear directions in this space corresponding to semantic edits of the identity
(e.g., adding a beard). These edits persist in appearance across generated
samples. Finally, we show that inverting a single image into this space
reconstructs a realistic identity, even if the input image is out of
distribution (e.g., a painting). Our results indicate that the weight space of
fine-tuned diffusion models behaves as an interpretable latent space of
identities.