LayerComposer:基于空间感知分层画布的交互式个性化文本到图像生成
LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas
October 23, 2025
作者: Guocheng Gordon Qian, Ruihang Zhang, Tsai-Shien Chen, Yusuf Dalva, Anujraaj Argo Goyal, Willi Menapace, Ivan Skorokhodov, Meng Dong, Arpit Sahni, Daniil Ostashev, Ju Hu, Sergey Tulyakov, Kuan-Chieh Jackson Wang
cs.AI
摘要
尽管现有个性化生成模型具备出色的视觉保真度,但其缺乏对空间构图的交互控制能力,且在处理多主体场景时扩展性不足。为突破这些局限,我们提出LayerComposer——一个支持交互式多主体个性化文生图的框架。本方法包含两大核心贡献:(1)分层画布这一创新表征形式,将每个主体置于独立图层以实现无遮挡构图;(2)锁定机制在保持选定图层高保真度的同时,允许其余图层灵活适应周边语境。类似专业图像编辑软件,所提出的分层画布使用户能通过直观的图层操作来放置、缩放或锁定输入主体。我们的通用锁定机制无需调整模型架构,而是利用固有位置编码与创新的互补数据采样策略。大量实验表明,在多主体个性化图像生成任务中,LayerComposer在空间控制与身份保持方面均优于当前最先进方法。
English
Despite their impressive visual fidelity, existing personalized generative
models lack interactive control over spatial composition and scale poorly to
multiple subjects. To address these limitations, we present LayerComposer, an
interactive framework for personalized, multi-subject text-to-image generation.
Our approach introduces two main contributions: (1) a layered canvas, a novel
representation in which each subject is placed on a distinct layer, enabling
occlusion-free composition; and (2) a locking mechanism that preserves selected
layers with high fidelity while allowing the remaining layers to adapt flexibly
to the surrounding context. Similar to professional image-editing software, the
proposed layered canvas allows users to place, resize, or lock input subjects
through intuitive layer manipulation. Our versatile locking mechanism requires
no architectural changes, relying instead on inherent positional embeddings
combined with a new complementary data sampling strategy. Extensive experiments
demonstrate that LayerComposer achieves superior spatial control and identity
preservation compared to the state-of-the-art methods in multi-subject
personalized image generation.