X-Adapter：為升級的擴散模型增加插件的通用相容性

摘要

我們介紹了 X-Adapter，一個通用升級器，可使預訓練的即插即用模組（例如 ControlNet、LoRA）直接與升級後的文圖擴散模型（例如 SDXL）配合工作，無需進行進一步的重新訓練。我們通過訓練一個額外的網路來控制凍結的升級模型與新的文圖數據對。具體而言，X-Adapter 保留舊模型的凍結副本，以保留不同插件的連接器。此外，X-Adapter 添加可訓練的映射層，用於連接不同版本模型的解碼器，以進行特徵重映射。重映射的特徵將作為升級模型的指導。為了增強 X-Adapter 的指導能力，我們採用了一種針對升級模型的空文本訓練策略。在訓練後，我們還引入了一種兩階段去噪策略，以對齊 X-Adapter 和升級模型的初始潛在空間。由於我們的策略，X-Adapter 展示了與各種插件的通用兼容性，並且還使不同版本的插件能夠一起工作，從而擴展了擴散社區的功能。為驗證所提方法的有效性，我們進行了大量實驗，結果表明 X-Adapter 可能有助於在升級的基礎擴散模型中實現更廣泛的應用。

English

We introduce X-Adapter, a universal upgrader to enable the pretrained plug-and-play modules (e.g., ControlNet, LoRA) to work directly with the upgraded text-to-image diffusion model (e.g., SDXL) without further retraining. We achieve this goal by training an additional network to control the frozen upgraded model with the new text-image data pairs. In detail, X-Adapter keeps a frozen copy of the old model to preserve the connectors of different plugins. Additionally, X-Adapter adds trainable mapping layers that bridge the decoders from models of different versions for feature remapping. The remapped features will be used as guidance for the upgraded model. To enhance the guidance ability of X-Adapter, we employ a null-text training strategy for the upgraded model. After training, we also introduce a two-stage denoising strategy to align the initial latents of X-Adapter and the upgraded model. Thanks to our strategies, X-Adapter demonstrates universal compatibility with various plugins and also enables plugins of different versions to work together, thereby expanding the functionalities of diffusion community. To verify the effectiveness of the proposed method, we conduct extensive experiments and the results show that X-Adapter may facilitate wider application in the upgraded foundational diffusion model.

X-Adapter：為升級的擴散模型增加插件的通用相容性

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

摘要

Support