Qwen-Image-Layered:通過圖層分解實現內在可編輯性
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
December 17, 2025
作者: Shengming Yin, Zekai Zhang, Zecheng Tang, Kaiyuan Gao, Xiao Xu, Kun Yan, Jiahao Li, Yilei Chen, Yuxiang Chen, Heung-Yeung Shum, Lionel M. Ni, Jingren Zhou, Junyang Lin, Chenfei Wu
cs.AI
摘要
當前視覺生成模型在圖像編輯時常因點陣圖的固有特性而面臨一致性難題——所有視覺內容被融合至單一畫布導致相互纏結。與此相反,專業設計工具採用分層表徵技術,可在保持整體一致性的前提下實現局部獨立編輯。受此啟發,我們提出Qwen-Image-Layered:一種端到端的擴散模型,能將單張RGB圖像解構為多個語義解耦的RGBA圖層,從而實現內生可編輯性——每個RGBA圖層均可獨立操作而不影響其他內容。為支持可變數量圖層的解構,我們引入三大核心組件:(1)RGBA-VAE統一RGB與RGBA圖像的潛在表徵;(2)VLD-MMDiT(可變層解構MMDiT)架構支持解構可變數量的圖像層;(3)多階段訓練策略將預訓練圖像生成模型適配為多層圖像解構器。此外,針對高質量多層訓練數據稀缺的問題,我們構建了從Photoshop文檔(PSD)提取並標註多層圖像的流水線。實驗表明,本方法在解構質量上顯著超越現有方案,為一致性圖像編輯建立了新範式。相關代碼與模型已開源於:https://github.com/QwenLM/Qwen-Image-Layered
English
Recent visual generative models often struggle with consistency during image editing due to the entangled nature of raster images, where all visual content is fused into a single canvas. In contrast, professional design tools employ layered representations, allowing isolated edits while preserving consistency. Motivated by this, we propose Qwen-Image-Layered, an end-to-end diffusion model that decomposes a single RGB image into multiple semantically disentangled RGBA layers, enabling inherent editability, where each RGBA layer can be independently manipulated without affecting other content. To support variable-length decomposition, we introduce three key components: (1) an RGBA-VAE to unify the latent representations of RGB and RGBA images; (2) a VLD-MMDiT (Variable Layers Decomposition MMDiT) architecture capable of decomposing a variable number of image layers; and (3) a Multi-stage Training strategy to adapt a pretrained image generation model into a multilayer image decomposer. Furthermore, to address the scarcity of high-quality multilayer training images, we build a pipeline to extract and annotate multilayer images from Photoshop documents (PSD). Experiments demonstrate that our method significantly surpasses existing approaches in decomposition quality and establishes a new paradigm for consistent image editing. Our code and models are released on https://github.com/QwenLM/Qwen-Image-Layered{https://github.com/QwenLM/Qwen-Image-Layered}