ChatPaper.aiChatPaper

DiffDecompose:基于扩散Transformer的Alpha合成图像逐层分解

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

May 24, 2025
作者: Zitong Wang, Hang Zhao, Qianyu Zhou, Xuequan Lu, Xiangtai Li, Yiren Song
cs.AI

摘要

扩散模型近期在诸多生成任务中取得了显著成功,如物体移除。然而,现有图像分解方法因依赖掩码先验、静态物体假设及数据集匮乏,难以有效处理半透明或透明层遮挡问题。本文深入探讨了一项新颖任务:Alpha合成图像的层级分解,旨在从单一重叠图像中恢复构成层,特别是在半透明/透明alpha层非线性遮挡条件下。为应对层级模糊性、泛化能力及数据稀缺性等挑战,我们首先推出了AlphaBlend,这是首个大规模高质量透明与半透明层分解数据集,支持六项现实世界子任务(如半透明光斑去除、半透明细胞分解、玻璃器皿分解)。基于此数据集,我们提出了DiffDecompose,一个基于扩散Transformer的框架,它学习在输入图像、语义提示及混合类型条件下可能层分解的后验分布。DiffDecompose不直接回归alpha遮罩,而是执行上下文分解,使模型能在无需逐层监督的情况下预测一个或多个层,并引入层位置编码克隆以保持跨层像素级对应关系。在提出的AlphaBlend数据集及公开的LOGO数据集上的广泛实验验证了DiffDecompose的有效性。代码与数据集将在论文接受后公开。我们的代码将发布于:https://github.com/Wangzt1121/DiffDecompose。
English
Diffusion models have recently motivated great success in many generation tasks like object removal. Nevertheless, existing image decomposition methods struggle to disentangle semi-transparent or transparent layer occlusions due to mask prior dependencies, static object assumptions, and the lack of datasets. In this paper, we delve into a novel task: Layer-Wise Decomposition of Alpha-Composited Images, aiming to recover constituent layers from single overlapped images under the condition of semi-transparent/transparent alpha layer non-linear occlusion. To address challenges in layer ambiguity, generalization, and data scarcity, we first introduce AlphaBlend, the first large-scale and high-quality dataset for transparent and semi-transparent layer decomposition, supporting six real-world subtasks (e.g., translucent flare removal, semi-transparent cell decomposition, glassware decomposition). Building on this dataset, we present DiffDecompose, a diffusion Transformer-based framework that learns the posterior over possible layer decompositions conditioned on the input image, semantic prompts, and blending type. Rather than regressing alpha mattes directly, DiffDecompose performs In-Context Decomposition, enabling the model to predict one or multiple layers without per-layer supervision, and introduces Layer Position Encoding Cloning to maintain pixel-level correspondence across layers. Extensive experiments on the proposed AlphaBlend dataset and public LOGO dataset verify the effectiveness of DiffDecompose. The code and dataset will be available upon paper acceptance. Our code will be available at: https://github.com/Wangzt1121/DiffDecompose.
PDF72June 5, 2025