SliderSpace:拆解擴散模型的視覺能力
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
February 3, 2025
作者: Rohit Gandikota, Zongze Wu, Richard Zhang, David Bau, Eli Shechtman, Nick Kolkin
cs.AI
摘要
我們提出了SliderSpace,這是一個框架,可以將擴散模型的視覺能力自動分解為可控且人類可理解的方向。與現有的控制方法不同,這些方法需要用戶為每個編輯方向單獨指定屬性,SliderSpace可以從單個文本提示中同時發現多個可解釋且多樣化的方向。每個方向都被訓練為低秩適配器,實現組合控制並發現模型潛在空間中的令人驚訝的可能性。通過對最先進的擴散模型進行大量實驗,我們展示了SliderSpace在三個應用中的有效性:概念分解、藝術風格探索和多樣性增強。我們的定量評估顯示,SliderSpace發現的方向有效地分解了模型知識的視覺結構,提供了有關擴散模型內部編碼的潛在能力的見解。用戶研究進一步驗證,與基準相比,我們的方法產生了更多樣化和有用的變化。我們的代碼、數據和訓練權重可在https://sliderspace.baulab.info獲得。
English
We present SliderSpace, a framework for automatically decomposing the visual
capabilities of diffusion models into controllable and human-understandable
directions. Unlike existing control methods that require a user to specify
attributes for each edit direction individually, SliderSpace discovers multiple
interpretable and diverse directions simultaneously from a single text prompt.
Each direction is trained as a low-rank adaptor, enabling compositional control
and the discovery of surprising possibilities in the model's latent space.
Through extensive experiments on state-of-the-art diffusion models, we
demonstrate SliderSpace's effectiveness across three applications: concept
decomposition, artistic style exploration, and diversity enhancement. Our
quantitative evaluation shows that SliderSpace-discovered directions decompose
the visual structure of model's knowledge effectively, offering insights into
the latent capabilities encoded within diffusion models. User studies further
validate that our method produces more diverse and useful variations compared
to baselines. Our code, data and trained weights are available at
https://sliderspace.baulab.infoSummary
AI-Generated Summary