X-适配器：为升级的扩散模型增加插件的通用兼容性

摘要

我们引入了X-适配器，这是一个通用的升级器，可以使预训练的即插即用模块（例如ControlNet、LoRA）直接与升级后的文图扩散模型（例如SDXL）配合工作，无需进一步重新训练。我们通过训练额外的网络来控制冻结的升级模型与新的文图数据对。具体来说，X-适配器保留旧模型的冻结副本，以保留不同插件的连接器。此外，X-适配器添加了可训练的映射层，用于连接不同版本模型的解码器进行特征重映射。重映射后的特征将用作升级模型的指导。为了增强X-适配器的指导能力，我们采用了一种空文本训练策略用于升级模型。训练后，我们还引入了一个两阶段去噪策略，以使X-适配器和升级模型的初始潜变量对齐。由于我们的策略，X-适配器展示了与各种插件的通用兼容性，并且还使不同版本的插件能够共同工作，从而扩展了扩散社区的功能。为验证所提方法的有效性，我们进行了大量实验，结果表明X-适配器可能有助于在升级后的基础扩散模型中实现更广泛的应用。

English

We introduce X-Adapter, a universal upgrader to enable the pretrained plug-and-play modules (e.g., ControlNet, LoRA) to work directly with the upgraded text-to-image diffusion model (e.g., SDXL) without further retraining. We achieve this goal by training an additional network to control the frozen upgraded model with the new text-image data pairs. In detail, X-Adapter keeps a frozen copy of the old model to preserve the connectors of different plugins. Additionally, X-Adapter adds trainable mapping layers that bridge the decoders from models of different versions for feature remapping. The remapped features will be used as guidance for the upgraded model. To enhance the guidance ability of X-Adapter, we employ a null-text training strategy for the upgraded model. After training, we also introduce a two-stage denoising strategy to align the initial latents of X-Adapter and the upgraded model. Thanks to our strategies, X-Adapter demonstrates universal compatibility with various plugins and also enables plugins of different versions to work together, thereby expanding the functionalities of diffusion community. To verify the effectiveness of the proposed method, we conduct extensive experiments and the results show that X-Adapter may facilitate wider application in the upgraded foundational diffusion model.

X-适配器：为升级的扩散模型增加插件的通用兼容性

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

摘要

Support