ChatPaper.aiChatPaper

ContextGen:面向身份一致的多实例生成的上下文布局锚定

ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation

October 13, 2025
作者: Ruihang Xu, Dewei Zhou, Fan Ma, Yi Yang
cs.AI

摘要

多实例图像生成(MIG)对于现代扩散模型而言仍是一项重大挑战,主要源于在实现精确控制对象布局及保持多个独立主体身份一致性方面的关键限制。为解决这些局限,我们提出了ContextGen,一种新颖的扩散Transformer框架,专为多实例生成设计,同时受布局和参考图像引导。我们的方法融合了两项核心技术贡献:一是上下文布局锚定(CLA)机制,它将复合布局图像融入生成上下文中,以稳固地将对象锚定在预定位置;二是身份一致性注意力(ICA),这是一种创新的注意力机制,利用上下文参考图像确保多个实例的身份一致性。鉴于该任务缺乏大规模、层次结构化的数据集,我们引入了IMIG-100K,首个包含详细布局与身份标注的数据集。大量实验证明,ContextGen确立了新的技术标杆,在控制精度、身份保真度及整体视觉质量上均超越了现有方法。
English
Multi-instance image generation (MIG) remains a significant challenge for modern diffusion models due to key limitations in achieving precise control over object layout and preserving the identity of multiple distinct subjects. To address these limitations, we introduce ContextGen, a novel Diffusion Transformer framework for multi-instance generation that is guided by both layout and reference images. Our approach integrates two key technical contributions: a Contextual Layout Anchoring (CLA) mechanism that incorporates the composite layout image into the generation context to robustly anchor the objects in their desired positions, and Identity Consistency Attention (ICA), an innovative attention mechanism that leverages contextual reference images to ensure the identity consistency of multiple instances. Recognizing the lack of large-scale, hierarchically-structured datasets for this task, we introduce IMIG-100K, the first dataset with detailed layout and identity annotations. Extensive experiments demonstrate that ContextGen sets a new state-of-the-art, outperforming existing methods in control precision, identity fidelity, and overall visual quality.
PDF82October 15, 2025