ChatPaper.aiChatPaper

全能生成:统一图像生成

OmniGen: Unified Image Generation

September 17, 2024
作者: Shitao Xiao, Yueze Wang, Junjie Zhou, Huaying Yuan, Xingrun Xing, Ruiran Yan, Shuting Wang, Tiejun Huang, Zheng Liu
cs.AI

摘要

在这项工作中,我们介绍了OmniGen,这是一种用于统一图像生成的新扩散模型。与流行的扩散模型(例如,稳定扩散)不同,OmniGen不再需要额外的模块,如ControlNet或IP-Adapter来处理各种控制条件。OmniGen的特点包括:1)统一性:OmniGen不仅展示了文本到图像生成的能力,还内在地支持其他下游任务,如图像编辑、主体驱动生成和视觉条件生成。此外,OmniGen可以通过将它们转换为图像生成任务来处理经典的计算机视觉任务,如边缘检测和人体姿势识别。2)简单性:OmniGen的架构非常简化,无需额外的文本编码器。此外,与现有的扩散模型相比,它更加用户友好,使得可以通过指令完成复杂任务,无需额外的预处理步骤(例如,人体姿势估计),从而显著简化图像生成的工作流程。3)知识转移:通过以统一格式学习,OmniGen有效地在不同任务之间转移知识,管理未见过的任务和领域,并展示新的能力。我们还探讨了模型的推理能力以及链式思维机制的潜在应用。这项工作代表了通用图像生成模型的首次尝试,但仍存在一些未解决的问题。我们将在https://github.com/VectorSpaceLab/OmniGen 开源相关资源,以促进该领域的进展。
English
In this work, we introduce OmniGen, a new diffusion model for unified image generation. Unlike popular diffusion models (e.g., Stable Diffusion), OmniGen no longer requires additional modules such as ControlNet or IP-Adapter to process diverse control conditions. OmniGenis characterized by the following features: 1) Unification: OmniGen not only demonstrates text-to-image generation capabilities but also inherently supports other downstream tasks, such as image editing, subject-driven generation, and visual-conditional generation. Additionally, OmniGen can handle classical computer vision tasks by transforming them into image generation tasks, such as edge detection and human pose recognition. 2) Simplicity: The architecture of OmniGen is highly simplified, eliminating the need for additional text encoders. Moreover, it is more user-friendly compared to existing diffusion models, enabling complex tasks to be accomplished through instructions without the need for extra preprocessing steps (e.g., human pose estimation), thereby significantly simplifying the workflow of image generation. 3) Knowledge Transfer: Through learning in a unified format, OmniGen effectively transfers knowledge across different tasks, manages unseen tasks and domains, and exhibits novel capabilities. We also explore the model's reasoning capabilities and potential applications of chain-of-thought mechanism. This work represents the first attempt at a general-purpose image generation model, and there remain several unresolved issues. We will open-source the related resources at https://github.com/VectorSpaceLab/OmniGen to foster advancements in this field.

Summary

AI-Generated Summary

PDF1157November 16, 2024