OmniGen2:迈向先进多模态生成的探索
OmniGen2: Exploration to Advanced Multimodal Generation
June 23, 2025
作者: Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, Ze Liu, Ziyi Xia, Chaofan Li, Haoge Deng, Jiahao Wang, Kun Luo, Bo Zhang, Defu Lian, Xinlong Wang, Zhongyuan Wang, Tiejun Huang, Zheng Liu
cs.AI
摘要
在本研究中,我们推出了OmniGen2,这是一款多功能且开源的生成模型,旨在为多样化的生成任务提供统一解决方案,涵盖文本到图像生成、图像编辑及上下文生成等领域。与OmniGen v1不同,OmniGen2针对文本和图像模态设计了两种独立的解码路径,采用非共享参数和解耦的图像分词器。这一设计使得OmniGen2能够在无需重新适配VAE输入的情况下,基于现有的多模态理解模型进行构建,从而保留了原有的文本生成能力。为了支持OmniGen2的训练,我们开发了全面的数据构建流程,包括图像编辑和上下文生成数据的处理。此外,我们特别为图像生成任务引入了一种反思机制,并基于OmniGen2精心策划了一个专门的反思数据集。尽管参数规模相对适中,OmniGen2在多项任务基准测试中均取得了具有竞争力的成绩,包括文本到图像生成和图像编辑。为了进一步评估上下文生成(亦称主题驱动任务),我们引入了一个名为OmniContext的新基准。在一致性方面,OmniGen2在开源模型中达到了最先进的性能。我们将发布我们的模型、训练代码、数据集及数据构建流程,以支持该领域的未来研究。项目页面:https://vectorspacelab.github.io/OmniGen2;GitHub链接:https://github.com/VectorSpaceLab/OmniGen2。
English
In this work, we introduce OmniGen2, a versatile and open-source generative
model designed to provide a unified solution for diverse generation tasks,
including text-to-image, image editing, and in-context generation. Unlike
OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image
modalities, utilizing unshared parameters and a decoupled image tokenizer. This
design enables OmniGen2 to build upon existing multimodal understanding models
without the need to re-adapt VAE inputs, thereby preserving the original text
generation capabilities. To facilitate the training of OmniGen2, we developed
comprehensive data construction pipelines, encompassing image editing and
in-context generation data. Additionally, we introduce a reflection mechanism
tailored for image generation tasks and curate a dedicated reflection dataset
based on OmniGen2. Despite its relatively modest parameter size, OmniGen2
achieves competitive results on multiple task benchmarks, including
text-to-image and image editing. To further evaluate in-context generation,
also referred to as subject-driven tasks, we introduce a new benchmark named
OmniContext. OmniGen2 achieves state-of-the-art performance among open-source
models in terms of consistency. We will release our models, training code,
datasets, and data construction pipeline to support future research in this
field. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link:
https://github.com/VectorSpaceLab/OmniGen2