ChatPaper.aiChatPaper

上下文生成:面向身份一致的多实例生成的上下文布局锚定

ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation

October 13, 2025
作者: Ruihang Xu, Dewei Zhou, Fan Ma, Yi Yang
cs.AI

摘要

多實例圖像生成(MIG)對於現代擴散模型而言仍是一大挑戰,主要受限於在實現精確控制物體佈局及保持多個獨立主體身份一致性方面的關鍵不足。為應對這些限制,我們提出了ContextGen,這是一種新穎的擴散變壓器框架,專為多實例生成設計,並以佈局和參考圖像為指導。我們的方法融合了兩項核心技術貢獻:一是上下文佈局錨定(CLA)機制,它將複合佈局圖像融入生成上下文中,以穩固地將物體錨定在期望位置;二是身份一致性注意力(ICA),這是一種創新的注意力機制,利用上下文參考圖像來確保多實例的身份一致性。鑑於此任務缺乏大規模、層次結構化的數據集,我們引入了IMIG-100K,這是首個包含詳細佈局和身份註釋的數據集。大量實驗證明,ContextGen在控制精度、身份保真度及整體視覺質量上均超越了現有方法,樹立了新的技術標杆。
English
Multi-instance image generation (MIG) remains a significant challenge for modern diffusion models due to key limitations in achieving precise control over object layout and preserving the identity of multiple distinct subjects. To address these limitations, we introduce ContextGen, a novel Diffusion Transformer framework for multi-instance generation that is guided by both layout and reference images. Our approach integrates two key technical contributions: a Contextual Layout Anchoring (CLA) mechanism that incorporates the composite layout image into the generation context to robustly anchor the objects in their desired positions, and Identity Consistency Attention (ICA), an innovative attention mechanism that leverages contextual reference images to ensure the identity consistency of multiple instances. Recognizing the lack of large-scale, hierarchically-structured datasets for this task, we introduce IMIG-100K, the first dataset with detailed layout and identity annotations. Extensive experiments demonstrate that ContextGen sets a new state-of-the-art, outperforming existing methods in control precision, identity fidelity, and overall visual quality.
PDF82October 15, 2025