合成分层设计数据是否有助于分层设计分解？

摘要

最近在图像生成领域的进展使得生成高质量图像变得容易。然而，这些输出本质上呈扁平化状态，将前景元素、背景和文字混杂在固定的画布中。因此，灵活的后生成编辑仍具挑战性，暴露出通往实际可用性的明显「最后一哩」差距。现有方法要么依赖稀缺的专有分层资产，要么从有限的结构性先验知识中构建部分合成数据。然而，这两种策略在可扩展性上都面临根本性挑战。本研究探讨纯粹合成分层数据是否能改善图形设计分解。我们假设，在图形设计中，有效的分解不需要像自然图像合成那样精确建模层间依赖关系，因为设计元素通常被有意安排为模块化且语义上可分离的组件。具体而言，我们基于当前最先进的图层分解框架 CLD 基线进行一项以数据为中心的研究。在此基线上，我们构建自己的合成数据集 SynLayers，利用视觉语言模型生成文本监督信息，并使用 VLM 预测的边界框自动生成推理输入。我们的研究揭示三项关键发现：（1）即使仅使用纯合成数据训练，也能超越不可扩展的替代方案（如广泛使用的 PrismLayersPro 数据集），证明其作为可扩展且有效的替代方案的可行性；（2）性能随训练数据规模增加而持续提升，但在约 5 万个样本时增益开始趋于饱和；（3）合成数据能实现对图层数量分布的均衡控制，避免真实数据集中常见的图层数量失衡问题。我们希望这项以数据为中心的研究能够鼓励更广泛地采用合成数据，作为分层设计编辑系统的实用基础。

English

Recent advances in image generation have made it easy to produce high-quality images. However, these outputs are inherently flattened, entangling foreground elements, background, and text within a fixed canvas. As a result, flexible post-generation editing remains challenging, revealing a clear last-mile gap toward practical usability. Existing approaches either rely on scarce proprietary layered assets or construct partially synthetic data from limited structural priors. However, both strategies face fundamental challenges in scalability. In this work, we investigate whether pure synthetic layered data can improve graphic design decomposition. We make the assumption that, in graphic design, effective decomposition does not require modeling inter-layer dependencies as precisely as in natural-image composition, since design elements are often intentionally arranged as modular and semantically separable components. Concretely, we conduct a data-centric study based on CLD baseline, which is a state-of-the-art layer decomposition framework. Based on the baseline, we construct our own synthetic dataset, SynLayers, generate textual supervision using vision language models, and automate inference inputs with VLM-predicted bounding boxes. Our study reveals three key findings: (1) even training with purely synthetic data can outperform non-scalable alternatives such as the widely used PrismLayersPro dataset, demonstrating its viability as a scalable and effective substitute; (2) performance consistently improves with increased training data scale, while gains begin to saturate at around 50K samples; and (3) synthetic data enables balanced control over layer-count distributions, avoiding the layer-count imbalance commonly observed in real-world datasets. We hope this data-centric study encourages broader adoption of synthetic data as a practical foundation for layered design editing systems.