FreeControl：无需训练的任意文本到图像扩散模型的空间控制与任意条件

摘要

最近的方法，如ControlNet，为用户提供了对文本到图像（T2I）扩散模型进行精细空间控制的能力。然而，必须为每种空间条件、模型架构和检查点训练辅助模块，这使它们与人类设计师在内容创建过程中希望传达给AI模型的多样意图和偏好相冲突。在这项工作中，我们提出了FreeControl，这是一种无需训练的可控T2I生成方法，支持同时多种条件、架构和检查点。FreeControl设计了结构指导，以促进结构与指导图像的对齐，以及外观指导，以实现使用相同种子生成的图像之间外观共享。大量定性和定量实验表明，FreeControl在各种预训练T2I模型上表现出优越性能。特别是，FreeControl实现了方便的无需训练控制，涵盖了许多不同的架构和检查点，允许挑战性的输入条件，大多数现有无需训练方法无法胜任，并且在合成质量上与基于训练的方法具有竞争力。

English

Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-image (T2I) diffusion models. However, auxiliary modules have to be trained for each type of spatial condition, model architecture, and checkpoint, putting them at odds with the diverse intents and preferences a human designer would like to convey to the AI models during the content creation process. In this work, we present FreeControl, a training-free approach for controllable T2I generation that supports multiple conditions, architectures, and checkpoints simultaneously. FreeControl designs structure guidance to facilitate the structure alignment with a guidance image, and appearance guidance to enable the appearance sharing between images generated using the same seed. Extensive qualitative and quantitative experiments demonstrate the superior performance of FreeControl across a variety of pre-trained T2I models. In particular, FreeControl facilitates convenient training-free control over many different architectures and checkpoints, allows the challenging input conditions on which most of the existing training-free methods fail, and achieves competitive synthesis quality with training-based approaches.

FreeControl：无需训练的任意文本到图像扩散模型的空间控制与任意条件

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

摘要

Support