MOSAIC:基於對應感知對齊與解耦的多學科個性化生成
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
September 2, 2025
作者: Dong She, Siming Fu, Mushui Liu, Qiaoqiao Jin, Hualiang Wang, Mu Liu, Jidong Jiang
cs.AI
摘要
多主體個性化生成在基於多個參考主體合成圖像時,面臨著保持身份忠實性和語義連貫性的獨特挑戰。現有方法由於未能充分建模不同主體在共享表示空間中應如何互動,常遭遇身份混合和屬性洩露的問題。我們提出了MOSAIC,這是一個以表示為中心的框架,通過顯式的語義對應和正交特徵解耦,重新思考多主體生成。我們的關鍵洞見是,多主體生成需要在表示層面實現精確的語義對齊——明確知道生成圖像中的哪些區域應關注每個參考的哪些部分。為此,我們引入了SemAlign-MS,這是一個精心註釋的數據集,提供了多個參考主體與目標圖像之間的細粒度語義對應,此前在該領域尚不可得。基於此,我們提出了語義對應注意力損失,以強制精確的點對點語義對齊,確保從每個參考到其指定區域的高度一致性。此外,我們開發了多參考解耦損失,將不同主體推入正交的注意力子空間,防止特徵干擾的同時保留個體身份特徵。大量實驗表明,MOSAIC在多個基準測試中達到了最先進的性能。值得注意的是,當現有方法通常在超過3個主體時性能下降,MOSAIC在4個及以上參考主體的情況下仍保持高保真度,為複雜的多主體合成應用開闢了新的可能性。
English
Multi-subject personalized generation presents unique challenges in
maintaining identity fidelity and semantic coherence when synthesizing images
conditioned on multiple reference subjects. Existing methods often suffer from
identity blending and attribute leakage due to inadequate modeling of how
different subjects should interact within shared representation spaces. We
present MOSAIC, a representation-centric framework that rethinks multi-subject
generation through explicit semantic correspondence and orthogonal feature
disentanglement. Our key insight is that multi-subject generation requires
precise semantic alignment at the representation level - knowing exactly which
regions in the generated image should attend to which parts of each reference.
To enable this, we introduce SemAlign-MS, a meticulously annotated dataset
providing fine-grained semantic correspondences between multiple reference
subjects and target images, previously unavailable in this domain. Building on
this foundation, we propose the semantic correspondence attention loss to
enforce precise point-to-point semantic alignment, ensuring high consistency
from each reference to its designated regions. Furthermore, we develop the
multi-reference disentanglement loss to push different subjects into orthogonal
attention subspaces, preventing feature interference while preserving
individual identity characteristics. Extensive experiments demonstrate that
MOSAIC achieves state-of-the-art performance on multiple benchmarks. Notably,
while existing methods typically degrade beyond 3 subjects, MOSAIC maintains
high fidelity with 4+ reference subjects, opening new possibilities for complex
multi-subject synthesis applications.