MOSAIC: 対応関係を考慮したアライメントと分離によるマルチサブジェクトのパーソナライズド生成

要旨

マルチサブジェクトのパーソナライズド生成は、複数の参照対象に基づいて画像を合成する際に、同一性の忠実度と意味的整合性を維持するという独自の課題を提示します。既存の手法では、異なる対象が共有表現空間内でどのように相互作用すべきかを適切にモデル化できていないため、同一性の混同や属性の漏洩がしばしば発生します。本論文では、MOSAICという表現中心のフレームワークを提案します。これは、明示的な意味的対応と直交的特徴の分離を通じて、マルチサブジェクト生成を再考するものです。我々の重要な洞察は、マルチサブジェクト生成には表現レベルでの精密な意味的アラインメントが必要であるということです。つまり、生成された画像のどの領域が各参照のどの部分に注目すべきかを正確に把握する必要があります。これを実現するために、SemAlign-MSという細かく注釈付けされたデータセットを導入します。このデータセットは、複数の参照対象とターゲット画像間の細粒度の意味的対応を提供し、この分野ではこれまで利用できなかったものです。この基盤に基づいて、精密なポイントツーポイントの意味的アラインメントを強制する意味的対応アテンション損失を提案し、各参照からその指定された領域への高い一貫性を確保します。さらに、異なる対象を直交的なアテンション部分空間に押し込むマルチリファレンス分離損失を開発し、特徴の干渉を防ぎながら個々の同一性特性を保持します。広範な実験により、MOSAICが複数のベンチマークで最先端の性能を達成することが示されています。特に、既存の手法では通常3つ以上の対象を超えると性能が低下しますが、MOSAICは4つ以上の参照対象でも高い忠実度を維持し、複雑なマルチサブジェクト合成アプリケーションの新たな可能性を開拓します。

English

Multi-subject personalized generation presents unique challenges in maintaining identity fidelity and semantic coherence when synthesizing images conditioned on multiple reference subjects. Existing methods often suffer from identity blending and attribute leakage due to inadequate modeling of how different subjects should interact within shared representation spaces. We present MOSAIC, a representation-centric framework that rethinks multi-subject generation through explicit semantic correspondence and orthogonal feature disentanglement. Our key insight is that multi-subject generation requires precise semantic alignment at the representation level - knowing exactly which regions in the generated image should attend to which parts of each reference. To enable this, we introduce SemAlign-MS, a meticulously annotated dataset providing fine-grained semantic correspondences between multiple reference subjects and target images, previously unavailable in this domain. Building on this foundation, we propose the semantic correspondence attention loss to enforce precise point-to-point semantic alignment, ensuring high consistency from each reference to its designated regions. Furthermore, we develop the multi-reference disentanglement loss to push different subjects into orthogonal attention subspaces, preventing feature interference while preserving individual identity characteristics. Extensive experiments demonstrate that MOSAIC achieves state-of-the-art performance on multiple benchmarks. Notably, while existing methods typically degrade beyond 3 subjects, MOSAIC maintains high fidelity with 4+ reference subjects, opening new possibilities for complex multi-subject synthesis applications.

MOSAIC: 対応関係を考慮したアライメントと分離によるマルチサブジェクトのパーソナライズド生成

MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

要旨

Support