DiffDecompose:基於擴散變換器的Alpha合成圖像逐層分解
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers
May 24, 2025
作者: Zitong Wang, Hang Zhao, Qianyu Zhou, Xuequan Lu, Xiangtai Li, Yiren Song
cs.AI
摘要
擴散模型近期在許多生成任務中取得了巨大成功,例如物體移除。然而,現有的圖像分解方法由於依賴於遮罩先驗、靜態物體假設以及數據集的缺乏,難以解開半透明或透明層的遮擋問題。本文深入探討了一個新任務:Alpha合成圖像的逐層分解,旨在從單一重疊圖像中恢復出構成層,條件是半透明/透明Alpha層的非線性遮擋。為應對層模糊性、泛化性和數據稀缺性等挑戰,我們首先引入了AlphaBlend,這是首個大規模高質量的透明和半透明層分解數據集,支持六個現實世界子任務(例如,半透明光斑移除、半透明細胞分解、玻璃器皿分解)。基於此數據集,我們提出了DiffDecompose,這是一個基於擴散Transformer的框架,它學習了在輸入圖像、語義提示和混合類型條件下可能層分解的後驗分佈。DiffDecompose不直接回歸Alpha遮罩,而是執行上下文分解,使模型能夠在無需逐層監督的情況下預測一個或多個層,並引入了層位置編碼克隆以保持跨層的像素級對應關係。在提出的AlphaBlend數據集和公開的LOGO數據集上進行的大量實驗驗證了DiffDecompose的有效性。代碼和數據集將在論文接受後公開。我們的代碼將在以下網址提供:https://github.com/Wangzt1121/DiffDecompose。
English
Diffusion models have recently motivated great success in many generation
tasks like object removal. Nevertheless, existing image decomposition methods
struggle to disentangle semi-transparent or transparent layer occlusions due to
mask prior dependencies, static object assumptions, and the lack of datasets.
In this paper, we delve into a novel task: Layer-Wise Decomposition of
Alpha-Composited Images, aiming to recover constituent layers from single
overlapped images under the condition of semi-transparent/transparent alpha
layer non-linear occlusion. To address challenges in layer ambiguity,
generalization, and data scarcity, we first introduce AlphaBlend, the first
large-scale and high-quality dataset for transparent and semi-transparent layer
decomposition, supporting six real-world subtasks (e.g., translucent flare
removal, semi-transparent cell decomposition, glassware decomposition).
Building on this dataset, we present DiffDecompose, a diffusion
Transformer-based framework that learns the posterior over possible layer
decompositions conditioned on the input image, semantic prompts, and blending
type. Rather than regressing alpha mattes directly, DiffDecompose performs
In-Context Decomposition, enabling the model to predict one or multiple layers
without per-layer supervision, and introduces Layer Position Encoding Cloning
to maintain pixel-level correspondence across layers. Extensive experiments on
the proposed AlphaBlend dataset and public LOGO dataset verify the
effectiveness of DiffDecompose. The code and dataset will be available upon
paper acceptance. Our code will be available at:
https://github.com/Wangzt1121/DiffDecompose.