ControlNet++：通過高效一致性改進條件控制反饋

摘要

為了增強文本到圖像擴散模型的可控性，現有的方法如ControlNet納入了基於圖像的條件控制。在本文中，我們揭示現有方法在生成與圖像條件控制相符的圖像方面仍面臨重大挑戰。為此，我們提出了ControlNet++，一種新穎的方法，通過明確優化生成圖像與條件控制之間的像素級循環一致性來改善可控生成。具體而言，對於輸入的條件控制，我們使用預先訓練的辨識獎勵模型來提取生成圖像的相應條件，然後優化輸入條件控制與提取條件之間的一致性損失。一種直接的實現方式是從隨機噪聲生成圖像，然後計算一致性損失，但這種方法需要存儲多個採樣時間步的梯度，導致相當大的時間和內存成本。為了解決這個問題，我們引入了一種有效的獎勵策略，通過故意通過添加噪聲干擾輸入圖像，然後使用經過單步去噪的圖像進行獎勵微調。這樣可以避免與圖像採樣相關的廣泛成本，從而實現更有效的獎勵微調。大量實驗表明，ControlNet++在各種條件控制下顯著提高了可控性。例如，在分割遮罩、線條藝術邊緣和深度條件方面，它分別比ControlNet提高了7.9%的mIoU、13.4%的SSIM和7.6%的RMSE。

English

To enhance the controllability of text-to-image diffusion models, existing efforts like ControlNet incorporated image-based conditional controls. In this paper, we reveal that existing methods still face significant challenges in generating images that align with the image conditional controls. To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls. Specifically, for an input conditional control, we use a pre-trained discriminative reward model to extract the corresponding condition of the generated images, and then optimize the consistency loss between the input conditional control and extracted condition. A straightforward implementation would be generating images from random noises and then calculating the consistency loss, but such an approach requires storing gradients for multiple sampling timesteps, leading to considerable time and memory costs. To address this, we introduce an efficient reward strategy that deliberately disturbs the input images by adding noise, and then uses the single-step denoised images for reward fine-tuning. This avoids the extensive costs associated with image sampling, allowing for more efficient reward fine-tuning. Extensive experiments show that ControlNet++ significantly improves controllability under various conditional controls. For example, it achieves improvements over ControlNet by 7.9% mIoU, 13.4% SSIM, and 7.6% RMSE, respectively, for segmentation mask, line-art edge, and depth conditions.

ControlNet++：通過高效一致性改進條件控制反饋

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

摘要

Support