ChatPaper.aiChatPaper

ControlNeXt:图像和视频生成的强大高效控制

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

August 12, 2024
作者: Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming-Chang Yang, Jiaya Jia
cs.AI

摘要

扩散模型在图像和视频生成方面展现出卓越且稳健的能力。为了实现对生成结果更大的控制,研究人员引入了额外的架构,如ControlNet、适配器和ReferenceNet,以整合条件控制。然而,目前的可控生成方法通常需要大量额外的计算资源,尤其是对于视频生成,面临着训练挑战或表现出弱控制的问题。在本文中,我们提出了ControlNeXt:一种强大且高效的可控图像和视频生成方法。我们首先设计了一种更简单高效的架构,用较小的额外成本替代了繁重的额外分支,与基础模型相比。这样简洁的结构还使我们的方法能够与其他LoRA权重无缝集成,实现风格修改而无需额外训练。在训练方面,我们相比其他方法减少了高达90%的可学习参数。此外,我们提出了另一种名为交叉归一化(CN)的方法,作为“零卷积”的替代方案,实现快速且稳定的训练收敛。我们在图像和视频的不同基础模型上进行了各种实验,展示了我们方法的稳健性。
English
Diffusion models have demonstrated remarkable and robust abilities in both image and video generation. To achieve greater control over generated results, researchers introduce additional architectures, such as ControlNet, Adapters and ReferenceNet, to integrate conditioning controls. However, current controllable generation methods often require substantial additional computational resources, especially for video generation, and face challenges in training or exhibit weak control. In this paper, we propose ControlNeXt: a powerful and efficient method for controllable image and video generation. We first design a more straightforward and efficient architecture, replacing heavy additional branches with minimal additional cost compared to the base model. Such a concise structure also allows our method to seamlessly integrate with other LoRA weights, enabling style alteration without the need for additional training. As for training, we reduce up to 90% of learnable parameters compared to the alternatives. Furthermore, we propose another method called Cross Normalization (CN) as a replacement for Zero-Convolution' to achieve fast and stable training convergence. We have conducted various experiments with different base models across images and videos, demonstrating the robustness of our method.

Summary

AI-Generated Summary

PDF548November 28, 2024