OmniGen2：迈向先进多模态生成的探索

摘要

在本研究中，我们推出了OmniGen2，这是一款多功能且开源的生成模型，旨在为多样化的生成任务提供统一解决方案，涵盖文本到图像生成、图像编辑及上下文生成等领域。与OmniGen v1不同，OmniGen2针对文本和图像模态设计了两种独立的解码路径，采用非共享参数和解耦的图像分词器。这一设计使得OmniGen2能够在无需重新适配VAE输入的情况下，基于现有的多模态理解模型进行构建，从而保留了原有的文本生成能力。为了支持OmniGen2的训练，我们开发了全面的数据构建流程，包括图像编辑和上下文生成数据的处理。此外，我们特别为图像生成任务引入了一种反思机制，并基于OmniGen2精心策划了一个专门的反思数据集。尽管参数规模相对适中，OmniGen2在多项任务基准测试中均取得了具有竞争力的成绩，包括文本到图像生成和图像编辑。为了进一步评估上下文生成（亦称主题驱动任务），我们引入了一个名为OmniContext的新基准。在一致性方面，OmniGen2在开源模型中达到了最先进的性能。我们将发布我们的模型、训练代码、数据集及数据构建流程，以支持该领域的未来研究。项目页面：https://vectorspacelab.github.io/OmniGen2；GitHub链接：https://github.com/VectorSpaceLab/OmniGen2。

English

In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables OmniGen2 to build upon existing multimodal understanding models without the need to re-adapt VAE inputs, thereby preserving the original text generation capabilities. To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data. Additionally, we introduce a reflection mechanism tailored for image generation tasks and curate a dedicated reflection dataset based on OmniGen2. Despite its relatively modest parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate in-context generation, also referred to as subject-driven tasks, we introduce a new benchmark named OmniContext. OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release our models, training code, datasets, and data construction pipeline to support future research in this field. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link: https://github.com/VectorSpaceLab/OmniGen2

OmniGen2：迈向先进多模态生成的探索

OmniGen2: Exploration to Advanced Multimodal Generation

摘要

Support