ChatPaper.aiChatPaper

ControlNeXt:強大且高效的影像和影片生成控制

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

August 12, 2024
作者: Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming-Chang Yang, Jiaya Jia
cs.AI

摘要

擴散模型在圖像和視頻生成方面展現出卓越且穩健的能力。為了更好地控制生成結果,研究人員引入了額外的架構,如ControlNet、Adapters和ReferenceNet,以整合條件控制。然而,目前的可控生成方法通常需要大量的額外計算資源,尤其是在視頻生成方面,並且在訓練過程中面臨挑戰或表現出薄弱的控制能力。在本文中,我們提出了ControlNeXt:一種強大且高效的可控圖像和視頻生成方法。我們首先設計了一種更簡單且高效的架構,將繁重的額外分支替換為與基本模型相比成本最小的額外部分。這種簡潔的結構還使我們的方法能夠與其他LoRA權重無縫整合,實現風格修改而無需額外訓練。在訓練方面,我們相對於其他方法減少了高達90%的可學習參數。此外,我們提出了另一種名為Cross Normalization(CN)的方法,作為“Zero-Convolution”的替代方案,以實現快速且穩定的訓練收斂。我們對不同基本模型在圖像和視頻上進行了各種實驗,展示了我們方法的穩健性。
English
Diffusion models have demonstrated remarkable and robust abilities in both image and video generation. To achieve greater control over generated results, researchers introduce additional architectures, such as ControlNet, Adapters and ReferenceNet, to integrate conditioning controls. However, current controllable generation methods often require substantial additional computational resources, especially for video generation, and face challenges in training or exhibit weak control. In this paper, we propose ControlNeXt: a powerful and efficient method for controllable image and video generation. We first design a more straightforward and efficient architecture, replacing heavy additional branches with minimal additional cost compared to the base model. Such a concise structure also allows our method to seamlessly integrate with other LoRA weights, enabling style alteration without the need for additional training. As for training, we reduce up to 90% of learnable parameters compared to the alternatives. Furthermore, we propose another method called Cross Normalization (CN) as a replacement for Zero-Convolution' to achieve fast and stable training convergence. We have conducted various experiments with different base models across images and videos, demonstrating the robustness of our method.

Summary

AI-Generated Summary

PDF548November 28, 2024