OmniGen2：邁向先進多模態生成的探索

摘要

在本研究中，我們介紹了OmniGen2，這是一個多功能且開源的生成模型，旨在為多樣化的生成任務提供統一解決方案，包括文本到圖像生成、圖像編輯以及上下文生成。與OmniGen v1不同，OmniGen2具備兩個獨立的解碼路徑，分別針對文本和圖像模態，並採用非共享參數和分離的圖像標記器。此設計使OmniGen2能夠基於現有的多模態理解模型進行構建，而無需重新適應VAE輸入，從而保留了原有的文本生成能力。為了促進OmniGen2的訓練，我們開發了全面的數據構建管道，涵蓋了圖像編輯和上下文生成數據。此外，我們引入了一種專為圖像生成任務設計的反思機制，並基於OmniGen2策劃了一個專用的反思數據集。儘管參數規模相對適中，OmniGen2在多個任務基準測試中取得了競爭力的結果，包括文本到圖像生成和圖像編輯。為了進一步評估上下文生成（亦稱為主題驅動任務），我們引入了一個名為OmniContext的新基準。OmniGen2在一致性方面達到了開源模型中的最先進性能。我們將發布我們的模型、訓練代碼、數據集以及數據構建管道，以支持該領域的未來研究。項目頁面：https://vectorspacelab.github.io/OmniGen2；GitHub鏈接：https://github.com/VectorSpaceLab/OmniGen2

English

In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables OmniGen2 to build upon existing multimodal understanding models without the need to re-adapt VAE inputs, thereby preserving the original text generation capabilities. To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data. Additionally, we introduce a reflection mechanism tailored for image generation tasks and curate a dedicated reflection dataset based on OmniGen2. Despite its relatively modest parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate in-context generation, also referred to as subject-driven tasks, we introduce a new benchmark named OmniContext. OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release our models, training code, datasets, and data construction pipeline to support future research in this field. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link: https://github.com/VectorSpaceLab/OmniGen2

OmniGen2：邁向先進多模態生成的探索

OmniGen2: Exploration to Advanced Multimodal Generation

摘要

Support