可控分层解构的可逆多层图像生成

摘要

本研究提出可控分层分解(CLD)方法，实现栅格图像的精细化可控多层分离。在实际设计流程中，设计师通常先独立生成并编辑各RGBA图层，再合成为最终栅格图像。但这一过程不可逆：一旦合成，便无法进行图层级编辑。现有方法多依赖图像抠图与修复技术，但在可控性与分割精度方面仍存在局限。为解决这些挑战，我们提出两个核心模块：LayerDecompose-DiT(LD-DiT)通过解耦图像元素至独立图层实现精细化控制；多层条件适配器(MLCA)将目标图像信息注入多层标记以实现精准条件生成。为进行全面评估，我们构建了新基准数据集并定制了专用评估指标。实验结果表明，CLD在分解质量与可控性方面均优于现有方法。此外，CLD分离出的图层可直接在PowerPoint等常用设计工具中进行编辑，凸显了其在现实创作流程中的实用价值与适用性。

English

This work presents Controllable Layer Decomposition (CLD), a method for achieving fine-grained and controllable multi-layer separation of raster images. In practical workflows, designers typically generate and edit each RGBA layer independently before compositing them into a final raster image. However, this process is irreversible: once composited, layer-level editing is no longer possible. Existing methods commonly rely on image matting and inpainting, but remain limited in controllability and segmentation precision. To address these challenges, we propose two key modules: LayerDecompose-DiT (LD-DiT), which decouples image elements into distinct layers and enables fine-grained control; and Multi-Layer Conditional Adapter (MLCA), which injects target image information into multi-layer tokens to achieve precise conditional generation. To enable a comprehensive evaluation, we build a new benchmark and introduce tailored evaluation metrics. Experimental results show that CLD consistently outperforms existing methods in both decomposition quality and controllability. Furthermore, the separated layers produced by CLD can be directly manipulated in commonly used design tools such as PowerPoint, highlighting its practical value and applicability in real-world creative workflows.