全能生成:统一图像生成
OmniGen: Unified Image Generation
September 17, 2024
作者: Shitao Xiao, Yueze Wang, Junjie Zhou, Huaying Yuan, Xingrun Xing, Ruiran Yan, Shuting Wang, Tiejun Huang, Zheng Liu
cs.AI
摘要
在这项工作中,我们介绍了OmniGen,这是一种用于统一图像生成的新扩散模型。与流行的扩散模型(例如,稳定扩散)不同,OmniGen不再需要额外的模块,如ControlNet或IP-Adapter来处理各种控制条件。OmniGen的特点包括:1)统一性:OmniGen不仅展示了文本到图像生成的能力,还内在地支持其他下游任务,如图像编辑、主体驱动生成和视觉条件生成。此外,OmniGen可以通过将它们转换为图像生成任务来处理经典的计算机视觉任务,如边缘检测和人体姿势识别。2)简单性:OmniGen的架构非常简化,无需额外的文本编码器。此外,与现有的扩散模型相比,它更加用户友好,使得可以通过指令完成复杂任务,无需额外的预处理步骤(例如,人体姿势估计),从而显著简化图像生成的工作流程。3)知识转移:通过以统一格式学习,OmniGen有效地在不同任务之间转移知识,管理未见过的任务和领域,并展示新的能力。我们还探讨了模型的推理能力以及链式思维机制的潜在应用。这项工作代表了通用图像生成模型的首次尝试,但仍存在一些未解决的问题。我们将在https://github.com/VectorSpaceLab/OmniGen 开源相关资源,以促进该领域的进展。
English
In this work, we introduce OmniGen, a new diffusion model for unified image
generation. Unlike popular diffusion models (e.g., Stable Diffusion), OmniGen
no longer requires additional modules such as ControlNet or IP-Adapter to
process diverse control conditions. OmniGenis characterized by the following
features: 1) Unification: OmniGen not only demonstrates text-to-image
generation capabilities but also inherently supports other downstream tasks,
such as image editing, subject-driven generation, and visual-conditional
generation. Additionally, OmniGen can handle classical computer vision tasks by
transforming them into image generation tasks, such as edge detection and human
pose recognition. 2) Simplicity: The architecture of OmniGen is highly
simplified, eliminating the need for additional text encoders. Moreover, it is
more user-friendly compared to existing diffusion models, enabling complex
tasks to be accomplished through instructions without the need for extra
preprocessing steps (e.g., human pose estimation), thereby significantly
simplifying the workflow of image generation. 3) Knowledge Transfer: Through
learning in a unified format, OmniGen effectively transfers knowledge across
different tasks, manages unseen tasks and domains, and exhibits novel
capabilities. We also explore the model's reasoning capabilities and potential
applications of chain-of-thought mechanism. This work represents the first
attempt at a general-purpose image generation model, and there remain several
unresolved issues. We will open-source the related resources at
https://github.com/VectorSpaceLab/OmniGen to foster advancements in this field.Summary
AI-Generated Summary