PrismLayers: 고품질 다중 레이어 투명 이미지 생성 모델을 위한 오픈 데이터

초록

고품질의 다중 레이어 투명 이미지를 텍스트 프롬프트로 생성하는 것은 새로운 수준의 창의적 제어를 가능하게 하여, 사용자가 각 레이어를 LLM(대형 언어 모델)의 텍스트 출력을 편집하듯 쉽게 수정할 수 있게 합니다. 그러나 다중 레이어 생성 모델의 개발은 대규모 고품질 다중 레이어 투명 데이터셋의 부재로 인해 기존의 텍스트-이미지 모델에 비해 뒤쳐져 있습니다. 본 논문에서는 이러한 근본적인 문제를 해결하기 위해: (i) 정확한 알파 매트(alpha matte)를 포함한 200K(20K)의 다중 레이어 투명 이미지로 구성된 최초의 오픈 소스 초고화질 PrismLayers(PrismLayersPro) 데이터셋을 공개하고, (ii) 기존의 확산 모델을 활용하여 이러한 데이터를 필요에 따라 생성하는 학습이 필요 없는 합성 파이프라인을 소개하며, (iii) 현대적인 텍스트-이미지 생성 모델의 미학을 따라가는 강력한 오픈소스 다중 레이어 생성 모델인 ART+를 제공합니다. 주요 기술적 기여로는 정확한 알파 매트를 포함한 고품질 단일 투명 레이어를 생성하는 데 탁월한 LayerFLUX와, 인간이 주석을 단 시맨틱 레이아웃을 기반으로 여러 LayerFLUX 출력을 완전한 이미지로 구성하는 MultiLayerFLUX가 있습니다. 더 높은 품질을 보장하기 위해, 아티팩트와 시맨틱 불일치를 제거하는 엄격한 필터링 단계를 거친 후 인간의 선택을 적용합니다. 최신 ART 모델을 우리의 합성 PrismLayersPro 데이터셋으로 미세 조정한 결과, ART+는 원본 ART보다 60%의 헤드투헤드 사용자 연구 비교에서 우수한 성능을 보였으며, 심지어 FLUX.1-[dev] 모델이 생성한 이미지의 시각적 품질과도 맞먹는 결과를 달성했습니다. 우리의 작업이 다중 레이어 투명 이미지 생성 작업을 위한 견고한 데이터셋 기반을 마련하여, 정밀하고 편집 가능하며 시각적으로 매력적인 레이어 이미지가 필요한 연구와 응용 분야를 활성화할 것으로 기대합니다.

English

Generating high-quality, multi-layer transparent images from text prompts can unlock a new level of creative control, allowing users to edit each layer as effortlessly as editing text outputs from LLMs. However, the development of multi-layer generative models lags behind that of conventional text-to-image models due to the absence of a large, high-quality corpus of multi-layer transparent data. In this paper, we address this fundamental challenge by: (i) releasing the first open, ultra-high-fidelity PrismLayers (PrismLayersPro) dataset of 200K (20K) multilayer transparent images with accurate alpha mattes, (ii) introducing a trainingfree synthesis pipeline that generates such data on demand using off-the-shelf diffusion models, and (iii) delivering a strong, open-source multi-layer generation model, ART+, which matches the aesthetics of modern text-to-image generation models. The key technical contributions include: LayerFLUX, which excels at generating high-quality single transparent layers with accurate alpha mattes, and MultiLayerFLUX, which composes multiple LayerFLUX outputs into complete images, guided by human-annotated semantic layout. To ensure higher quality, we apply a rigorous filtering stage to remove artifacts and semantic mismatches, followed by human selection. Fine-tuning the state-of-the-art ART model on our synthetic PrismLayersPro yields ART+, which outperforms the original ART in 60% of head-to-head user study comparisons and even matches the visual quality of images generated by the FLUX.1-[dev] model. We anticipate that our work will establish a solid dataset foundation for the multi-layer transparent image generation task, enabling research and applications that require precise, editable, and visually compelling layered imagery.

PrismLayers: 고품질 다중 레이어 투명 이미지 생성 모델을 위한 오픈 데이터

PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models

초록

Support