DiffDecompose:基于扩散Transformer的Alpha合成图像逐层分解
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers
May 24, 2025
作者: Zitong Wang, Hang Zhao, Qianyu Zhou, Xuequan Lu, Xiangtai Li, Yiren Song
cs.AI
摘要
扩散模型近期在诸多生成任务中取得了显著成功,如物体移除。然而,现有图像分解方法因依赖掩码先验、静态物体假设及数据集匮乏,难以有效处理半透明或透明层遮挡问题。本文深入探讨了一项新颖任务:Alpha合成图像的层级分解,旨在从单一重叠图像中恢复构成层,特别是在半透明/透明alpha层非线性遮挡条件下。为应对层级模糊性、泛化能力及数据稀缺性等挑战,我们首先推出了AlphaBlend,这是首个大规模高质量透明与半透明层分解数据集,支持六项现实世界子任务(如半透明光斑去除、半透明细胞分解、玻璃器皿分解)。基于此数据集,我们提出了DiffDecompose,一个基于扩散Transformer的框架,它学习在输入图像、语义提示及混合类型条件下可能层分解的后验分布。DiffDecompose不直接回归alpha遮罩,而是执行上下文分解,使模型能在无需逐层监督的情况下预测一个或多个层,并引入层位置编码克隆以保持跨层像素级对应关系。在提出的AlphaBlend数据集及公开的LOGO数据集上的广泛实验验证了DiffDecompose的有效性。代码与数据集将在论文接受后公开。我们的代码将发布于:https://github.com/Wangzt1121/DiffDecompose。
English
Diffusion models have recently motivated great success in many generation
tasks like object removal. Nevertheless, existing image decomposition methods
struggle to disentangle semi-transparent or transparent layer occlusions due to
mask prior dependencies, static object assumptions, and the lack of datasets.
In this paper, we delve into a novel task: Layer-Wise Decomposition of
Alpha-Composited Images, aiming to recover constituent layers from single
overlapped images under the condition of semi-transparent/transparent alpha
layer non-linear occlusion. To address challenges in layer ambiguity,
generalization, and data scarcity, we first introduce AlphaBlend, the first
large-scale and high-quality dataset for transparent and semi-transparent layer
decomposition, supporting six real-world subtasks (e.g., translucent flare
removal, semi-transparent cell decomposition, glassware decomposition).
Building on this dataset, we present DiffDecompose, a diffusion
Transformer-based framework that learns the posterior over possible layer
decompositions conditioned on the input image, semantic prompts, and blending
type. Rather than regressing alpha mattes directly, DiffDecompose performs
In-Context Decomposition, enabling the model to predict one or multiple layers
without per-layer supervision, and introduces Layer Position Encoding Cloning
to maintain pixel-level correspondence across layers. Extensive experiments on
the proposed AlphaBlend dataset and public LOGO dataset verify the
effectiveness of DiffDecompose. The code and dataset will be available upon
paper acceptance. Our code will be available at:
https://github.com/Wangzt1121/DiffDecompose.