CCM:向文本到图像一致性模型中添加条件控制
CCM: Adding Conditional Controls to Text-to-Image Consistency Models
December 12, 2023
作者: Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Yu Liu, Xueyang Fu, Zheng-Jun Zha
cs.AI
摘要
一致性模型(CMs)已显示出在高效且高质量地创建视觉内容方面的潜力。然而,如何向预训练的CMs添加新的条件控制尚未被探索。在本技术报告中,我们考虑了为CMs添加类似ControlNet的条件控制的替代策略,并提出了三个重要发现。1)针对扩散模型(DMs)训练的ControlNet可以直接应用于CMs以进行高级语义控制,但在低级细节和逼真度控制方面存在困难。2)CMs作为一类独立的生成模型,可以基于此,使用宋等人提出的一致性训练从头开始训练ControlNet。3)通过一致性训练可以联合优化轻量级适配器,使其在多种条件下快速转移基于DMs的ControlNet到CMs。我们研究了这三种解决方案在各种条件控制下的效果,包括边缘、深度、人体姿势、低分辨率图像以及带有文本到图像潜在一致性模型的遮罩图像。
English
Consistency Models (CMs) have showed a promise in creating visual content
efficiently and with high quality. However, the way to add new conditional
controls to the pretrained CMs has not been explored. In this technical report,
we consider alternative strategies for adding ControlNet-like conditional
control to CMs and present three significant findings. 1) ControlNet trained
for diffusion models (DMs) can be directly applied to CMs for high-level
semantic controls but struggles with low-level detail and realism control. 2)
CMs serve as an independent class of generative models, based on which
ControlNet can be trained from scratch using Consistency Training proposed by
Song et al. 3) A lightweight adapter can be jointly optimized under multiple
conditions through Consistency Training, allowing for the swift transfer of
DMs-based ControlNet to CMs. We study these three solutions across various
conditional controls, including edge, depth, human pose, low-resolution image
and masked image with text-to-image latent consistency models.