DiffDecompose：基於擴散變換器的Alpha合成圖像逐層分解

摘要

擴散模型近期在許多生成任務中取得了巨大成功，例如物體移除。然而，現有的圖像分解方法由於依賴於遮罩先驗、靜態物體假設以及數據集的缺乏，難以解開半透明或透明層的遮擋問題。本文深入探討了一個新任務：Alpha合成圖像的逐層分解，旨在從單一重疊圖像中恢復出構成層，條件是半透明/透明Alpha層的非線性遮擋。為應對層模糊性、泛化性和數據稀缺性等挑戰，我們首先引入了AlphaBlend，這是首個大規模高質量的透明和半透明層分解數據集，支持六個現實世界子任務（例如，半透明光斑移除、半透明細胞分解、玻璃器皿分解）。基於此數據集，我們提出了DiffDecompose，這是一個基於擴散Transformer的框架，它學習了在輸入圖像、語義提示和混合類型條件下可能層分解的後驗分佈。DiffDecompose不直接回歸Alpha遮罩，而是執行上下文分解，使模型能夠在無需逐層監督的情況下預測一個或多個層，並引入了層位置編碼克隆以保持跨層的像素級對應關係。在提出的AlphaBlend數據集和公開的LOGO數據集上進行的大量實驗驗證了DiffDecompose的有效性。代碼和數據集將在論文接受後公開。我們的代碼將在以下網址提供：https://github.com/Wangzt1121/DiffDecompose。

English

Diffusion models have recently motivated great success in many generation tasks like object removal. Nevertheless, existing image decomposition methods struggle to disentangle semi-transparent or transparent layer occlusions due to mask prior dependencies, static object assumptions, and the lack of datasets. In this paper, we delve into a novel task: Layer-Wise Decomposition of Alpha-Composited Images, aiming to recover constituent layers from single overlapped images under the condition of semi-transparent/transparent alpha layer non-linear occlusion. To address challenges in layer ambiguity, generalization, and data scarcity, we first introduce AlphaBlend, the first large-scale and high-quality dataset for transparent and semi-transparent layer decomposition, supporting six real-world subtasks (e.g., translucent flare removal, semi-transparent cell decomposition, glassware decomposition). Building on this dataset, we present DiffDecompose, a diffusion Transformer-based framework that learns the posterior over possible layer decompositions conditioned on the input image, semantic prompts, and blending type. Rather than regressing alpha mattes directly, DiffDecompose performs In-Context Decomposition, enabling the model to predict one or multiple layers without per-layer supervision, and introduces Layer Position Encoding Cloning to maintain pixel-level correspondence across layers. Extensive experiments on the proposed AlphaBlend dataset and public LOGO dataset verify the effectiveness of DiffDecompose. The code and dataset will be available upon paper acceptance. Our code will be available at: https://github.com/Wangzt1121/DiffDecompose.

DiffDecompose：基於擴散變換器的Alpha合成圖像逐層分解

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

摘要

Support