LaDe: 통합 다중 계층 그래픽 미디어 생성 및 분해

초록

미디어 디자인 레이어 생성 기술은 자연어 프롬프트만으로 포스터, 전단지, 로고 등 완전히 편집 가능한 레이어형 디자인 문서를 생성할 수 있게 합니다. 기존 방법들은 출력을 고정된 수의 레이어로 제한하거나 각 레이어가 공간적으로 연속된 영역만 포함하도록 요구하여, 디자인 복잡도에 따라 레이어 수가 선형적으로 증가하는 문제가 있었습니다. 우리는 의미론적으로 의미 있는 유연한 수의 레이어를 생성하는 잠재 확산 프레임워크인 LaDe(Layered Media Design)를 제안합니다. LaDe는 세 가지 구성 요소를 결합합니다: 간단한 사용자 의도를 생성 과정을 안내하는 구조화된 레이어별 설명으로 변환하는 LLM 기반 프롬프트 확장기, 전체 미디어 디자인과 그 구성 요소인 RGBA 레이어를 공동으로 생성하는 4D RoPE 위치 인코딩 메커니즘이 적용된 Latent Diffusion Transformer, 그리고 완전한 알파 채널 지원으로 각 레이어를 디코딩하는 RGBA VAE입니다. 학습 중 레이어 샘플을 조건으로 사용함으로써, 우리의 통합 프레임워크는 텍스트-이미지 생성, 텍스트-레이어 미디어 디자인 생성, 미디어 디자인 분해의 세 가지 작업을 지원합니다. 우리는 Crello 테스트 세트에서 텍스트-레이어 및 이미지-레이어 작업에 대해 LaDe를 Qwen-Image-Layered와 비교합니다. 두 개의 VLM-as-a-judge 평가자(GPT-4o mini와 Qwen3-VL)를 통해 검증된 바와 같이, LaDe는 텍스트-레이어 정렬을 개선하여 텍스트-레이어 생성에서 Qwen-Image-Layered를 능가합니다.

English

Media design layer generation enables the creation of fully editable, layered design documents such as posters, flyers, and logos using only natural language prompts. Existing methods either restrict outputs to a fixed number of layers or require each layer to contain only spatially continuous regions, causing the layer count to scale linearly with design complexity. We propose LaDe (Layered Media Design), a latent diffusion framework that generates a flexible number of semantically meaningful layers. LaDe combines three components: an LLM-based prompt expander that transforms a short user intent into structured per-layer descriptions that guide the generation, a Latent Diffusion Transformer with a 4D RoPE positional encoding mechanism that jointly generates the full media design and its constituent RGBA layers, and an RGBA VAE that decodes each layer with full alpha-channel support. By conditioning on layer samples during training, our unified framework supports three tasks: text-to-image generation, text-to-layers media design generation, and media design decomposition. We compare LaDe to Qwen-Image-Layered on text-to-layers and image-to-layers tasks on the Crello test set. LaDe outperforms Qwen-Image-Layered in text-to-layers generation by improving text-to-layer alignment, as validated by two VLM-as-a-judge evaluators (GPT-4o mini and Qwen3-VL).

LaDe: 통합 다중 계층 그래픽 미디어 생성 및 분해

LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition

초록

Support