OmniConsistency: ペア化されたスタイライゼーションデータからのスタイル非依存の一貫性学習

要旨

拡散モデルは画像のスタイライゼーションを大幅に進化させてきたが、依然として2つの核心的な課題が残されている：(1) 複雑なシーン、特にアイデンティティ、構図、細部の一貫したスタイライゼーションの維持、(2) スタイルLoRAを用いた画像間変換パイプラインにおけるスタイルの劣化防止。GPT-4oの優れたスタイライゼーションの一貫性は、オープンソース手法とプロプライエタリモデルとの性能差を浮き彫りにしている。このギャップを埋めるため、我々は大規模な拡散トランスフォーマー（DiT）を活用した汎用一貫性プラグイン「OmniConsistency」を提案する。OmniConsistencyは以下の貢献を行う：(1) 整列した画像ペアでトレーニングされたインコンテキスト一貫性学習フレームワークによる堅牢な汎化、(2) スタイル学習と一貫性保持を分離した2段階の漸進的学習戦略によるスタイル劣化の軽減、(3) Fluxフレームワーク下で任意のスタイルLoRAと互換性のある完全なプラグアンドプレイ設計。大規模な実験により、OmniConsistencyが視覚的整合性と美的品質を大幅に向上させ、商用の最先端モデルGPT-4oに匹敵する性能を達成することが示された。

English

Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose OmniConsistency, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.

OmniConsistency: ペア化されたスタイライゼーションデータからのスタイル非依存の一貫性学習

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

要旨

Support