ChatPaper.aiChatPaper

PrismLayers:面向高质量多层透明图像生成模型的开源数据集

PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models

May 28, 2025
作者: Junwen Chen, Heyang Jiang, Yanbin Wang, Keming Wu, Ji Li, Chao Zhang, Keiji Yanai, Dong Chen, Yuhui Yuan
cs.AI

摘要

从文本提示生成高质量、多层透明图像能够开启创意控制的新境界,使用户能够像编辑大语言模型(LLM)的文本输出一样轻松地编辑每一层。然而,由于缺乏大规模、高质量的多层透明数据集,多层生成模型的发展落后于传统的文本到图像模型。本文通过以下方式应对这一根本性挑战:(i) 发布首个开放的超高保真PrismLayers(PrismLayersPro)数据集,包含20万(2万)张带有精确alpha遮罩的多层透明图像;(ii) 引入一种无需训练的合成流程,利用现成的扩散模型按需生成此类数据;(iii) 推出一个强大的开源多层生成模型ART+,其美学表现与现代文本到图像生成模型相媲美。关键技术贡献包括:LayerFLUX,擅长生成带有精确alpha遮罩的高质量单层透明图像;以及MultiLayerFLUX,它根据人工标注的语义布局将多个LayerFLUX输出组合成完整图像。为确保更高品质,我们实施了严格的过滤阶段以去除伪影和语义不匹配,随后进行人工筛选。在我们的合成PrismLayersPro上微调最先进的ART模型,得到了ART+,在60%的头对头用户研究比较中优于原版ART,甚至与FLUX.1-[dev]模型生成的图像视觉质量相当。我们预计,本工作将为多层透明图像生成任务奠定坚实的数据集基础,推动需要精确、可编辑且视觉吸引力强的分层图像的研究与应用。
English
Generating high-quality, multi-layer transparent images from text prompts can unlock a new level of creative control, allowing users to edit each layer as effortlessly as editing text outputs from LLMs. However, the development of multi-layer generative models lags behind that of conventional text-to-image models due to the absence of a large, high-quality corpus of multi-layer transparent data. In this paper, we address this fundamental challenge by: (i) releasing the first open, ultra-high-fidelity PrismLayers (PrismLayersPro) dataset of 200K (20K) multilayer transparent images with accurate alpha mattes, (ii) introducing a trainingfree synthesis pipeline that generates such data on demand using off-the-shelf diffusion models, and (iii) delivering a strong, open-source multi-layer generation model, ART+, which matches the aesthetics of modern text-to-image generation models. The key technical contributions include: LayerFLUX, which excels at generating high-quality single transparent layers with accurate alpha mattes, and MultiLayerFLUX, which composes multiple LayerFLUX outputs into complete images, guided by human-annotated semantic layout. To ensure higher quality, we apply a rigorous filtering stage to remove artifacts and semantic mismatches, followed by human selection. Fine-tuning the state-of-the-art ART model on our synthetic PrismLayersPro yields ART+, which outperforms the original ART in 60% of head-to-head user study comparisons and even matches the visual quality of images generated by the FLUX.1-[dev] model. We anticipate that our work will establish a solid dataset foundation for the multi-layer transparent image generation task, enabling research and applications that require precise, editable, and visually compelling layered imagery.

Summary

AI-Generated Summary

PDF62May 29, 2025