ChatPaper.aiChatPaper

Qwen-Image-Layered:通过层级分解实现内在可编辑性

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

December 17, 2025
作者: Shengming Yin, Zekai Zhang, Zecheng Tang, Kaiyuan Gao, Xiao Xu, Kun Yan, Jiahao Li, Yilei Chen, Yuxiang Chen, Heung-Yeung Shum, Lionel M. Ni, Jingren Zhou, Junyang Lin, Chenfei Wu
cs.AI

摘要

当前视觉生成模型在图像编辑时常因栅格图像的固有特性而难以保持一致性——所有视觉内容被融合至单一画布导致编辑相互干扰。相比之下,专业设计工具采用分层表征技术,可实现局部编辑且不影响其他内容。受此启发,我们提出Qwen-Image-Layered:一种端到端扩散模型,能将单张RGB图像解耦为多个语义分离的RGBA图层,实现原生可编辑性。每个RGBA图层均可独立操控而无需改动其他内容。为支持可变数量图层分解,我们引入三大核心组件:(1)RGBA-VAE统一RGB与RGBA图像的隐空间表征;(2)VLD-MMDiT(可变层分解MMDiT)架构支持动态层数分解;(3)多阶段训练策略将预训练图像生成模型适配为多层图像分解器。针对高质量分层训练数据稀缺的问题,我们构建了从Photoshop文档(PSD)中提取并标注多层图像的自动化流程。实验表明,本方法在分解质量上显著超越现有方案,为一致性图像编辑建立了新范式。代码与模型已开源:https://github.com/QwenLM/Qwen-Image-Layered
English
Recent visual generative models often struggle with consistency during image editing due to the entangled nature of raster images, where all visual content is fused into a single canvas. In contrast, professional design tools employ layered representations, allowing isolated edits while preserving consistency. Motivated by this, we propose Qwen-Image-Layered, an end-to-end diffusion model that decomposes a single RGB image into multiple semantically disentangled RGBA layers, enabling inherent editability, where each RGBA layer can be independently manipulated without affecting other content. To support variable-length decomposition, we introduce three key components: (1) an RGBA-VAE to unify the latent representations of RGB and RGBA images; (2) a VLD-MMDiT (Variable Layers Decomposition MMDiT) architecture capable of decomposing a variable number of image layers; and (3) a Multi-stage Training strategy to adapt a pretrained image generation model into a multilayer image decomposer. Furthermore, to address the scarcity of high-quality multilayer training images, we build a pipeline to extract and annotate multilayer images from Photoshop documents (PSD). Experiments demonstrate that our method significantly surpasses existing approaches in decomposition quality and establishes a new paradigm for consistent image editing. Our code and models are released on https://github.com/QwenLM/Qwen-Image-Layered{https://github.com/QwenLM/Qwen-Image-Layered}
PDF223December 19, 2025