MOSAIC：基于对应感知对齐与解耦的多主体个性化生成

摘要

多主体个性化生成在基于多个参考主体合成图像时，面临着保持身份保真度和语义一致性的独特挑战。现有方法由于未能充分建模不同主体在共享表示空间中的交互方式，常常出现身份混淆和属性泄露的问题。我们提出了MOSAIC，一个以表示为中心的框架，通过显式语义对应和正交特征解耦重新思考多主体生成。我们的核心见解是，多主体生成需要在表示层面实现精确的语义对齐——明确知道生成图像中的哪些区域应关注每个参考主体的哪些部分。为此，我们引入了SemAlign-MS，这是一个精心标注的数据集，提供了多个参考主体与目标图像之间的细粒度语义对应关系，这在以往的研究领域中尚属首次。基于此，我们提出了语义对应注意力损失，以强制执行精确的点对点语义对齐，确保每个参考主体与其指定区域的高度一致性。此外，我们开发了多参考解耦损失，将不同主体推入正交的注意力子空间，防止特征干扰的同时保留个体身份特征。大量实验表明，MOSAIC在多个基准测试中达到了最先进的性能。值得注意的是，现有方法通常在超过3个主体时性能下降，而MOSAIC在4个及以上参考主体时仍能保持高保真度，为复杂的多主体合成应用开辟了新的可能性。

English

Multi-subject personalized generation presents unique challenges in maintaining identity fidelity and semantic coherence when synthesizing images conditioned on multiple reference subjects. Existing methods often suffer from identity blending and attribute leakage due to inadequate modeling of how different subjects should interact within shared representation spaces. We present MOSAIC, a representation-centric framework that rethinks multi-subject generation through explicit semantic correspondence and orthogonal feature disentanglement. Our key insight is that multi-subject generation requires precise semantic alignment at the representation level - knowing exactly which regions in the generated image should attend to which parts of each reference. To enable this, we introduce SemAlign-MS, a meticulously annotated dataset providing fine-grained semantic correspondences between multiple reference subjects and target images, previously unavailable in this domain. Building on this foundation, we propose the semantic correspondence attention loss to enforce precise point-to-point semantic alignment, ensuring high consistency from each reference to its designated regions. Furthermore, we develop the multi-reference disentanglement loss to push different subjects into orthogonal attention subspaces, preventing feature interference while preserving individual identity characteristics. Extensive experiments demonstrate that MOSAIC achieves state-of-the-art performance on multiple benchmarks. Notably, while existing methods typically degrade beyond 3 subjects, MOSAIC maintains high fidelity with 4+ reference subjects, opening new possibilities for complex multi-subject synthesis applications.

MOSAIC：基于对应感知对齐与解耦的多主体个性化生成

MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

摘要

Support