LayerComposer：基于空间感知分层画布的交互式个性化文本生成图像

摘要

尽管现有个性化生成模型具备出色的视觉保真度，但其缺乏对空间构图的交互控制能力，且在处理多主体场景时扩展性不足。为突破这些局限，我们提出LayerComposer——一个支持交互式个性化多主体文生图的框架。本方法包含两大核心贡献：（1）引入分层画布这一新型表征形式，将每个主体置于独立图层，实现无遮挡构图；（2）设计锁定机制，在保持选定图层高保真度的同时，允许其余图层根据周边语境灵活适配。与专业图像编辑软件类似，所提出的分层画布使用户能通过直观的图层操作来放置、缩放或锁定输入主体。我们的通用锁定机制无需调整模型架构，而是结合固有位置编码与创新的互补数据采样策略。大量实验表明，在多主体个性化图像生成任务中，LayerComposer在空间控制与身份保持方面均优于当前最先进方法。

English

Despite their impressive visual fidelity, existing personalized generative models lack interactive control over spatial composition and scale poorly to multiple subjects. To address these limitations, we present LayerComposer, an interactive framework for personalized, multi-subject text-to-image generation. Our approach introduces two main contributions: (1) a layered canvas, a novel representation in which each subject is placed on a distinct layer, enabling occlusion-free composition; and (2) a locking mechanism that preserves selected layers with high fidelity while allowing the remaining layers to adapt flexibly to the surrounding context. Similar to professional image-editing software, the proposed layered canvas allows users to place, resize, or lock input subjects through intuitive layer manipulation. Our versatile locking mechanism requires no architectural changes, relying instead on inherent positional embeddings combined with a new complementary data sampling strategy. Extensive experiments demonstrate that LayerComposer achieves superior spatial control and identity preservation compared to the state-of-the-art methods in multi-subject personalized image generation.