PosterLLaVa: LLM 기반 통합 다중 모달 레이아웃 생성기 구축

초록

레이아웃 생성은 자동화된 그래픽 디자인을 달성하기 위한 핵심 요소로, 다양한 다중 모드 디자인 요소들의 위치와 크기를 시각적으로 매력적이고 제약 조건을 준수하는 방식으로 배치하는 것을 요구합니다. 기존의 접근 방식들은 대규모 응용 프로그램에 비효율적이거나 다양한 디자인 요구사항에 대한 유연성이 부족했습니다. 본 연구는 다중 모드 대형 언어 모델(MLLM)을 활용하여 다양한 디자인 작업을 수용할 수 있는 자동화된 그래픽 레이아웃 생성을 위한 통합 프레임워크를 소개합니다. 이와 대조적으로, 우리의 데이터 기반 방법은 구조화된 텍스트(JSON 형식)와 시각적 지침 튜닝을 사용하여 사용자 정의 자연어 명세를 포함한 특정 시각적 및 텍스트 제약 조건 하에서 레이아웃을 생성합니다. 우리는 광범위한 실험을 수행하고 공개된 다중 모드 레이아웃 생성 벤치마크에서 최첨단(SOTA) 성능을 달성하여 우리 방법의 효과를 입증했습니다. 또한, 기존 데이터셋이 실세계 그래픽 디자인의 복잡성을 포착하는 데 한계가 있음을 인식하고, 훨씬 더 도전적인 작업(사용자 제약 생성 및 복잡한 포스터)을 위한 두 가지 새로운 데이터셋을 제안하여 우리 모델의 실생활 유용성을 추가로 검증했습니다. 이 접근 방식은 우수한 접근성과 적응성을 바탕으로 대규모 그래픽 디자인 작업을 더욱 자동화합니다. 코드와 데이터셋은 https://github.com/posterllava/PosterLLaVA에서 공개될 예정입니다.

English

Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the effectiveness of our method. Moreover, recognizing existing datasets' limitations in capturing the complexity of real-world graphic designs, we propose two new datasets for much more challenging tasks (user-constrained generation and complicated poster), further validating our model's utility in real-life settings. Marking by its superior accessibility and adaptability, this approach further automates large-scale graphic design tasks. The code and datasets will be publicly available on https://github.com/posterllava/PosterLLaVA.

PosterLLaVa: LLM 기반 통합 다중 모달 레이아웃 생성기 구축

PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

초록

Support