MOSAIC:基于对应感知对齐与解耦的多主体个性化生成
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
September 2, 2025
作者: Dong She, Siming Fu, Mushui Liu, Qiaoqiao Jin, Hualiang Wang, Mu Liu, Jidong Jiang
cs.AI
摘要
多主体个性化生成在基于多个参考主体合成图像时,面临着保持身份保真度和语义一致性的独特挑战。现有方法由于未能充分建模不同主体在共享表示空间中的交互方式,常常出现身份混淆和属性泄露的问题。我们提出了MOSAIC,一个以表示为中心的框架,通过显式语义对应和正交特征解耦重新思考多主体生成。我们的核心见解是,多主体生成需要在表示层面实现精确的语义对齐——明确知道生成图像中的哪些区域应关注每个参考主体的哪些部分。为此,我们引入了SemAlign-MS,这是一个精心标注的数据集,提供了多个参考主体与目标图像之间的细粒度语义对应关系,这在以往的研究领域中尚属首次。基于此,我们提出了语义对应注意力损失,以强制执行精确的点对点语义对齐,确保每个参考主体与其指定区域的高度一致性。此外,我们开发了多参考解耦损失,将不同主体推入正交的注意力子空间,防止特征干扰的同时保留个体身份特征。大量实验表明,MOSAIC在多个基准测试中达到了最先进的性能。值得注意的是,现有方法通常在超过3个主体时性能下降,而MOSAIC在4个及以上参考主体时仍能保持高保真度,为复杂的多主体合成应用开辟了新的可能性。
English
Multi-subject personalized generation presents unique challenges in
maintaining identity fidelity and semantic coherence when synthesizing images
conditioned on multiple reference subjects. Existing methods often suffer from
identity blending and attribute leakage due to inadequate modeling of how
different subjects should interact within shared representation spaces. We
present MOSAIC, a representation-centric framework that rethinks multi-subject
generation through explicit semantic correspondence and orthogonal feature
disentanglement. Our key insight is that multi-subject generation requires
precise semantic alignment at the representation level - knowing exactly which
regions in the generated image should attend to which parts of each reference.
To enable this, we introduce SemAlign-MS, a meticulously annotated dataset
providing fine-grained semantic correspondences between multiple reference
subjects and target images, previously unavailable in this domain. Building on
this foundation, we propose the semantic correspondence attention loss to
enforce precise point-to-point semantic alignment, ensuring high consistency
from each reference to its designated regions. Furthermore, we develop the
multi-reference disentanglement loss to push different subjects into orthogonal
attention subspaces, preventing feature interference while preserving
individual identity characteristics. Extensive experiments demonstrate that
MOSAIC achieves state-of-the-art performance on multiple benchmarks. Notably,
while existing methods typically degrade beyond 3 subjects, MOSAIC maintains
high fidelity with 4+ reference subjects, opening new possibilities for complex
multi-subject synthesis applications.