ChatPaper.aiChatPaper

OmniGen2:邁向先進多模態生成的探索

OmniGen2: Exploration to Advanced Multimodal Generation

June 23, 2025
作者: Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, Ze Liu, Ziyi Xia, Chaofan Li, Haoge Deng, Jiahao Wang, Kun Luo, Bo Zhang, Defu Lian, Xinlong Wang, Zhongyuan Wang, Tiejun Huang, Zheng Liu
cs.AI

摘要

在本研究中,我們介紹了OmniGen2,這是一個多功能且開源的生成模型,旨在為多樣化的生成任務提供統一解決方案,包括文本到圖像生成、圖像編輯以及上下文生成。與OmniGen v1不同,OmniGen2具備兩個獨立的解碼路徑,分別針對文本和圖像模態,並採用非共享參數和分離的圖像標記器。此設計使OmniGen2能夠基於現有的多模態理解模型進行構建,而無需重新適應VAE輸入,從而保留了原有的文本生成能力。為了促進OmniGen2的訓練,我們開發了全面的數據構建管道,涵蓋了圖像編輯和上下文生成數據。此外,我們引入了一種專為圖像生成任務設計的反思機制,並基於OmniGen2策劃了一個專用的反思數據集。儘管參數規模相對適中,OmniGen2在多個任務基準測試中取得了競爭力的結果,包括文本到圖像生成和圖像編輯。為了進一步評估上下文生成(亦稱為主題驅動任務),我們引入了一個名為OmniContext的新基準。OmniGen2在一致性方面達到了開源模型中的最先進性能。我們將發布我們的模型、訓練代碼、數據集以及數據構建管道,以支持該領域的未來研究。項目頁面:https://vectorspacelab.github.io/OmniGen2;GitHub鏈接:https://github.com/VectorSpaceLab/OmniGen2
English
In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables OmniGen2 to build upon existing multimodal understanding models without the need to re-adapt VAE inputs, thereby preserving the original text generation capabilities. To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data. Additionally, we introduce a reflection mechanism tailored for image generation tasks and curate a dedicated reflection dataset based on OmniGen2. Despite its relatively modest parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate in-context generation, also referred to as subject-driven tasks, we introduce a new benchmark named OmniContext. OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release our models, training code, datasets, and data construction pipeline to support future research in this field. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link: https://github.com/VectorSpaceLab/OmniGen2
PDF482June 24, 2025