MOSAIC: 대응 인식 정렬 및 분리를 통한 다중 주제 맞춤형 생성

초록

다중 주체 개인화 생성은 여러 참조 주체를 조건으로 이미지를 합성할 때 정체성 충실도와 의미적 일관성을 유지하는 데 있어 독특한 과제를 제시합니다. 기존 방법들은 공유 표현 공간 내에서 서로 다른 주체들이 어떻게 상호작용해야 하는지에 대한 부적절한 모델링으로 인해 정체성 혼합 및 속성 누출 문제를 겪는 경우가 많습니다. 본 연구에서는 명시적인 의미적 대응과 직교적 특징 분리를 통해 다중 주체 생성을 재고하는 표현 중심 프레임워크인 MOSAIC을 제안합니다. 우리의 핵심 통찰은 다중 주체 생성이 표현 수준에서 정밀한 의미적 정렬을 필요로 한다는 것입니다. 즉, 생성된 이미지의 어떤 영역이 각 참조의 어떤 부분에 주목해야 하는지를 정확히 아는 것이 중요합니다. 이를 위해, 이 분야에서 이전에는 제공되지 않았던 다중 참조 주체와 대상 이미지 간의 세밀한 의미적 대응을 제공하는 SemAlign-MS 데이터셋을 도입했습니다. 이를 기반으로, 정확한 점대점 의미적 정렬을 강제하여 각 참조에서 지정된 영역까지의 높은 일관성을 보장하기 위한 의미적 대응 주의 손실을 제안합니다. 또한, 개별 정체성 특성을 보존하면서 특징 간 간섭을 방지하기 위해 서로 다른 주체들을 직교적 주의 부분공간으로 밀어내는 다중 참조 분리 손실을 개발했습니다. 광범위한 실험을 통해 MOSAIC이 여러 벤치마크에서 최첨단 성능을 달성함을 입증했습니다. 특히, 기존 방법들이 일반적으로 3개 이상의 주체에서는 성능이 저하되는 반면, MOSAIC은 4개 이상의 참조 주체에서도 높은 충실도를 유지하며 복잡한 다중 주체 합성 응용에 새로운 가능성을 열었습니다.

English

Multi-subject personalized generation presents unique challenges in maintaining identity fidelity and semantic coherence when synthesizing images conditioned on multiple reference subjects. Existing methods often suffer from identity blending and attribute leakage due to inadequate modeling of how different subjects should interact within shared representation spaces. We present MOSAIC, a representation-centric framework that rethinks multi-subject generation through explicit semantic correspondence and orthogonal feature disentanglement. Our key insight is that multi-subject generation requires precise semantic alignment at the representation level - knowing exactly which regions in the generated image should attend to which parts of each reference. To enable this, we introduce SemAlign-MS, a meticulously annotated dataset providing fine-grained semantic correspondences between multiple reference subjects and target images, previously unavailable in this domain. Building on this foundation, we propose the semantic correspondence attention loss to enforce precise point-to-point semantic alignment, ensuring high consistency from each reference to its designated regions. Furthermore, we develop the multi-reference disentanglement loss to push different subjects into orthogonal attention subspaces, preventing feature interference while preserving individual identity characteristics. Extensive experiments demonstrate that MOSAIC achieves state-of-the-art performance on multiple benchmarks. Notably, while existing methods typically degrade beyond 3 subjects, MOSAIC maintains high fidelity with 4+ reference subjects, opening new possibilities for complex multi-subject synthesis applications.

MOSAIC: 대응 인식 정렬 및 분리를 통한 다중 주제 맞춤형 생성

MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

초록

Support