FreeControl：無需訓練即可對任何條件下的文本到圖像擴散模型進行空間控制

摘要

最近的方法，如ControlNet，為使用者提供了對文本到圖像（T2I）擴散模型的精細空間控制。然而，為每種空間條件、模型架構和檢查點訓練輔助模組，使其與人類設計師在內容創作過程中希望傳達給AI模型的多樣意圖和偏好相抵觸。在這項工作中，我們提出了FreeControl，一種無需訓練的可控T2I生成方法，同時支持多種條件、架構和檢查點。FreeControl設計了結構引導，以促進結構與引導圖像的對齊，以及外觀引導，以實現使用相同種子生成的圖像之間的外觀共享。廣泛的定性和定量實驗證明了FreeControl在各種預訓練T2I模型上的優越性能。特別是，FreeControl促進了對許多不同架構和檢查點的便捷無需訓練的控制，允許大多數現有無需訓練方法失敗的具有挑戰性的輸入條件，並實現了與基於訓練的方法相媲美的合成質量。

English

Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-image (T2I) diffusion models. However, auxiliary modules have to be trained for each type of spatial condition, model architecture, and checkpoint, putting them at odds with the diverse intents and preferences a human designer would like to convey to the AI models during the content creation process. In this work, we present FreeControl, a training-free approach for controllable T2I generation that supports multiple conditions, architectures, and checkpoints simultaneously. FreeControl designs structure guidance to facilitate the structure alignment with a guidance image, and appearance guidance to enable the appearance sharing between images generated using the same seed. Extensive qualitative and quantitative experiments demonstrate the superior performance of FreeControl across a variety of pre-trained T2I models. In particular, FreeControl facilitates convenient training-free control over many different architectures and checkpoints, allows the challenging input conditions on which most of the existing training-free methods fail, and achieves competitive synthesis quality with training-based approaches.

FreeControl：無需訓練即可對任何條件下的文本到圖像擴散模型進行空間控制

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

摘要

Support