UniControl:一個統一的擴散模型,用於可控的視覺生成在野外
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
May 18, 2023
作者: Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Huan Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, Yun Fu, Ran Xu
cs.AI
摘要
在互動式人工智慧系統設計中,實現機器自主性和人類控制通常代表著不同的目標。視覺生成基礎模型,如穩定擴散,展示了在應對任意語言提示時導航這些目標的潛力。然而,它們在生成具有空間、結構或幾何控制的圖像方面常常表現不佳。整合這些控制,能夠在單一統一模型中應對各種視覺條件,仍然是一個未解決的挑戰。為此,我們引入了UniControl,一種新的生成基礎模型,將各種可控條件到圖像(C2I)任務統合到單一框架中,同時仍允許任意語言提示。UniControl實現了像素級精確的圖像生成,其中視覺條件主要影響生成的結構,而語言提示則引導風格和內容。為了使UniControl具備處理多樣視覺條件的能力,我們擴充了預訓練的文本到圖像擴散模型,並引入了一個任務感知的HyperNet來調節擴散模型,實現對不同C2I任務的同時適應。在九個獨特的C2I任務上訓練後,UniControl展示了令人印象深刻的零樣本生成能力,可以應對未見過的視覺條件。實驗結果顯示,UniControl通常優於相同模型大小的單任務控制方法的性能。這種控制多功能性使UniControl成為可控視覺生成領域的一個重大進步。
English
Achieving machine autonomy and human control often represent divergent
objectives in the design of interactive AI systems. Visual generative
foundation models such as Stable Diffusion show promise in navigating these
goals, especially when prompted with arbitrary languages. However, they often
fall short in generating images with spatial, structural, or geometric
controls. The integration of such controls, which can accommodate various
visual conditions in a single unified model, remains an unaddressed challenge.
In response, we introduce UniControl, a new generative foundation model that
consolidates a wide array of controllable condition-to-image (C2I) tasks within
a singular framework, while still allowing for arbitrary language prompts.
UniControl enables pixel-level-precise image generation, where visual
conditions primarily influence the generated structures and language prompts
guide the style and context. To equip UniControl with the capacity to handle
diverse visual conditions, we augment pretrained text-to-image diffusion models
and introduce a task-aware HyperNet to modulate the diffusion models, enabling
the adaptation to different C2I tasks simultaneously. Trained on nine unique
C2I tasks, UniControl demonstrates impressive zero-shot generation abilities
with unseen visual conditions. Experimental results show that UniControl often
surpasses the performance of single-task-controlled methods of comparable model
sizes. This control versatility positions UniControl as a significant
advancement in the realm of controllable visual generation.