OmniGen2：高度なマルチモーダル生成への探求

要旨

本研究では、テキストから画像への生成、画像編集、文脈内生成など多様な生成タスクを統一的に解決するための汎用的でオープンソースの生成モデル、OmniGen2を紹介します。OmniGen v1とは異なり、OmniGen2はテキストと画像のモダリティに対して独立したデコード経路を備え、共有されないパラメータと分離された画像トークナイザーを採用しています。この設計により、OmniGen2は既存のマルチモーダル理解モデルを基盤としつつ、VAE入力を再適応する必要なく、元のテキスト生成能力を維持することが可能です。OmniGen2のトレーニングを支援するため、画像編集や文脈内生成データを含む包括的なデータ構築パイプラインを開発しました。さらに、画像生成タスクに特化したリフレクションメカニズムを導入し、OmniGen2に基づいた専用のリフレクションデータセットをキュレーションしました。比較的控えめなパラメータサイズにもかかわらず、OmniGen2はテキストから画像への生成や画像編集を含む複数のタスクベンチマークで競争力のある結果を達成しています。文脈内生成（サブジェクト駆動タスクとも呼ばれる）をさらに評価するため、OmniContextという新しいベンチマークを導入しました。OmniGen2は、オープンソースモデルの中で一貫性の面で最先端の性能を発揮します。今後の研究を支援するため、モデル、トレーニングコード、データセット、およびデータ構築パイプラインを公開します。プロジェクトページ: https://vectorspacelab.github.io/OmniGen2; GitHubリンク: https://github.com/VectorSpaceLab/OmniGen2

English

In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables OmniGen2 to build upon existing multimodal understanding models without the need to re-adapt VAE inputs, thereby preserving the original text generation capabilities. To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data. Additionally, we introduce a reflection mechanism tailored for image generation tasks and curate a dedicated reflection dataset based on OmniGen2. Despite its relatively modest parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate in-context generation, also referred to as subject-driven tasks, we introduce a new benchmark named OmniContext. OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release our models, training code, datasets, and data construction pipeline to support future research in this field. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link: https://github.com/VectorSpaceLab/OmniGen2

OmniGen2：高度なマルチモーダル生成への探求

OmniGen2: Exploration to Advanced Multimodal Generation

要旨

Support