ControlNet++:通过高效一致性改进条件控制 反馈
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
April 11, 2024
作者: Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen
cs.AI
摘要
为了增强文本到图像扩散模型的可控性,现有的努力如ControlNet纳入了基于图像的条件控制。在本文中,我们揭示了现有方法在生成与图像条件控制相一致的图像方面仍面临重大挑战。为此,我们提出了ControlNet++,一种通过明确优化生成图像与条件控制之间的像素级循环一致性来改进可控生成的新方法。具体来说,对于输入的条件控制,我们使用预训练的鉴别奖励模型来提取生成图像的相应条件,然后优化输入条件控制和提取条件之间的一致性损失。一个直接的实现方法是从随机噪声中生成图像,然后计算一致性损失,但这种方法需要存储多个采样时间步长的梯度,导致相当大的时间和内存成本。为了解决这个问题,我们引入了一种有效的奖励策略,通过故意向输入图像添加噪声来干扰,然后使用经过单步去噪的图像进行奖励微调。这避免了与图像采样相关的巨大成本,从而实现更高效的奖励微调。大量实验证明,ControlNet++在各种条件控制下显著提高了可控性。例如,对于分割掩模、线条艺术边缘和深度条件,它分别比ControlNet提高了7.9%的mIoU,13.4%的SSIM和7.6%的RMSE。
English
To enhance the controllability of text-to-image diffusion models, existing
efforts like ControlNet incorporated image-based conditional controls. In this
paper, we reveal that existing methods still face significant challenges in
generating images that align with the image conditional controls. To this end,
we propose ControlNet++, a novel approach that improves controllable generation
by explicitly optimizing pixel-level cycle consistency between generated images
and conditional controls. Specifically, for an input conditional control, we
use a pre-trained discriminative reward model to extract the corresponding
condition of the generated images, and then optimize the consistency loss
between the input conditional control and extracted condition. A
straightforward implementation would be generating images from random noises
and then calculating the consistency loss, but such an approach requires
storing gradients for multiple sampling timesteps, leading to considerable time
and memory costs. To address this, we introduce an efficient reward strategy
that deliberately disturbs the input images by adding noise, and then uses the
single-step denoised images for reward fine-tuning. This avoids the extensive
costs associated with image sampling, allowing for more efficient reward
fine-tuning. Extensive experiments show that ControlNet++ significantly
improves controllability under various conditional controls. For example, it
achieves improvements over ControlNet by 7.9% mIoU, 13.4% SSIM, and 7.6% RMSE,
respectively, for segmentation mask, line-art edge, and depth conditions.Summary
AI-Generated Summary