PrismLayers:高品質多層透明圖像生成模型的開放數據集
PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models
May 28, 2025
作者: Junwen Chen, Heyang Jiang, Yanbin Wang, Keming Wu, Ji Li, Chao Zhang, Keiji Yanai, Dong Chen, Yuhui Yuan
cs.AI
摘要
從文本提示生成高品質、多層透明圖像,能夠開啟創意控制的新層次,讓使用者能像編輯大型語言模型(LLM)的文本輸出一樣輕鬆地編輯每一層。然而,由於缺乏大規模、高品質的多層透明數據集,多層生成模型的發展落後於傳統的文本到圖像模型。在本論文中,我們通過以下方式應對這一根本挑戰:(i) 發布首個開放的超高保真PrismLayers(PrismLayersPro)數據集,包含20萬(2萬)張帶有精確Alpha遮罩的多層透明圖像;(ii) 引入一種無需訓練的合成管道,利用現成的擴散模型按需生成此類數據;(iii) 提供一個強大的開源多層生成模型ART+,其美學效果與現代文本到圖像生成模型相媲美。關鍵技術貢獻包括:LayerFLUX,擅長生成帶有精確Alpha遮罩的高品質單層透明圖像;以及MultiLayerFLUX,根據人工標註的語義佈局將多個LayerFLUX輸出組合成完整圖像。為確保更高品質,我們應用嚴格的過濾階段去除偽影和語義不匹配,並進行人工篩選。在我們的合成PrismLayersPro數據集上微調最先進的ART模型,得到ART+,其在60%的用戶對比研究中表現優於原始ART,甚至與FLUX.1-[dev]模型生成的圖像視覺品質相當。我們預計,這項工作將為多層透明圖像生成任務奠定堅實的數據集基礎,推動需要精確、可編輯且視覺效果出色的分層圖像的研究與應用。
English
Generating high-quality, multi-layer transparent images from text prompts can
unlock a new level of creative control, allowing users to edit each layer as
effortlessly as editing text outputs from LLMs. However, the development of
multi-layer generative models lags behind that of conventional text-to-image
models due to the absence of a large, high-quality corpus of multi-layer
transparent data. In this paper, we address this fundamental challenge by: (i)
releasing the first open, ultra-high-fidelity PrismLayers (PrismLayersPro)
dataset of 200K (20K) multilayer transparent images with accurate alpha mattes,
(ii) introducing a trainingfree synthesis pipeline that generates such data on
demand using off-the-shelf diffusion models, and (iii) delivering a strong,
open-source multi-layer generation model, ART+, which matches the aesthetics of
modern text-to-image generation models. The key technical contributions
include: LayerFLUX, which excels at generating high-quality single transparent
layers with accurate alpha mattes, and MultiLayerFLUX, which composes multiple
LayerFLUX outputs into complete images, guided by human-annotated semantic
layout. To ensure higher quality, we apply a rigorous filtering stage to remove
artifacts and semantic mismatches, followed by human selection. Fine-tuning the
state-of-the-art ART model on our synthetic PrismLayersPro yields ART+, which
outperforms the original ART in 60% of head-to-head user study comparisons and
even matches the visual quality of images generated by the FLUX.1-[dev] model.
We anticipate that our work will establish a solid dataset foundation for the
multi-layer transparent image generation task, enabling research and
applications that require precise, editable, and visually compelling layered
imagery.Summary
AI-Generated Summary