DreamO: 画像カスタマイズのための統合フレームワーク

要旨

近年、画像のカスタマイズ（例：アイデンティティ、被写体、スタイル、背景など）に関する広範な研究が、大規模生成モデルにおける強力なカスタマイズ能力を示しています。しかし、ほとんどのアプローチは特定のタスク向けに設計されており、異なる種類の条件を組み合わせる汎用性が制限されています。画像カスタマイズのための統一フレームワークの開発は、依然として未解決の課題です。本論文では、幅広いタスクをサポートし、複数の条件をシームレスに統合することを可能にする画像カスタマイズフレームワーク、DreamOを提案します。具体的には、DreamOは拡散トランスフォーマー（DiT）フレームワークを利用して、異なる種類の入力を一様に処理します。トレーニング中、我々は様々なカスタマイズタスクを含む大規模なトレーニングデータセットを構築し、参照画像から関連情報を正確にクエリするための特徴ルーティング制約を導入します。さらに、特定のプレースホルダーを特定の位置の条件に関連付けるプレースホルダー戦略を設計し、生成結果における条件の配置を制御できるようにします。また、3段階からなる段階的トレーニング戦略を採用します。最初の段階では、限られたデータを用いた簡単なタスクに焦点を当ててベースラインの一貫性を確立し、次の段階ではカスタマイズ能力を包括的に強化し、最後の段階では低品質データによって導入された品質の偏りを修正します。広範な実験により、提案されたDreamOが、高品質で様々な画像カスタマイズタスクを効果的に実行し、異なる種類の制御条件を柔軟に統合できることが実証されています。

English

Recently, extensive research on image customization (e.g., identity, subject, style, background, etc.) demonstrates strong customization capabilities in large-scale generative models. However, most approaches are designed for specific tasks, restricting their generalizability to combine different types of condition. Developing a unified framework for image customization remains an open challenge. In this paper, we present DreamO, an image customization framework designed to support a wide range of tasks while facilitating seamless integration of multiple conditions. Specifically, DreamO utilizes a diffusion transformer (DiT) framework to uniformly process input of different types. During training, we construct a large-scale training dataset that includes various customization tasks, and we introduce a feature routing constraint to facilitate the precise querying of relevant information from reference images. Additionally, we design a placeholder strategy that associates specific placeholders with conditions at particular positions, enabling control over the placement of conditions in the generated results. Moreover, we employ a progressive training strategy consisting of three stages: an initial stage focused on simple tasks with limited data to establish baseline consistency, a full-scale training stage to comprehensively enhance the customization capabilities, and a final quality alignment stage to correct quality biases introduced by low-quality data. Extensive experiments demonstrate that the proposed DreamO can effectively perform various image customization tasks with high quality and flexibly integrate different types of control conditions.

DreamO: 画像カスタマイズのための統合フレームワーク

DreamO: A Unified Framework for Image Customization

要旨

Support