ChatPaper.aiChatPaper

海报LLaVa:利用LLM构建统一的多模态布局生成器

PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

June 5, 2024
作者: Tao Yang, Yingmin Luo, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen
cs.AI

摘要

布局生成是实现自动化图形设计的基石,需要以视觉上令人愉悦且遵循约束的方式安排各种多模式设计元素的位置和大小。先前的方法要么在大规模应用中效率低下,要么缺乏对不同设计要求的灵活性。我们的研究引入了一个统一的框架用于自动化图形布局生成,利用多模式大语言模型(MLLM)来适应各种设计任务。相比之下,我们的数据驱动方法采用结构化文本(JSON格式)和视觉指导调整来生成布局,以满足特定的视觉和文本约束,包括用户定义的自然语言规范。我们进行了大量实验,并在公开的多模式布局生成基准测试中取得了最先进的性能,展示了我们方法的有效性。此外,鉴于现有数据集在捕捉现实世界图形设计复杂性方面的局限性,我们提出了两个新数据集用于更具挑战性的任务(用户约束生成和复杂海报),进一步验证了我们模型在实际环境中的实用性。由于其出色的可访问性和适应性,这种方法进一步自动化了大规模图形设计任务。代码和数据集将在https://github.com/posterllava/PosterLLaVA 上公开提供。
English
Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the effectiveness of our method. Moreover, recognizing existing datasets' limitations in capturing the complexity of real-world graphic designs, we propose two new datasets for much more challenging tasks (user-constrained generation and complicated poster), further validating our model's utility in real-life settings. Marking by its superior accessibility and adaptability, this approach further automates large-scale graphic design tasks. The code and datasets will be publicly available on https://github.com/posterllava/PosterLLaVA.

Summary

AI-Generated Summary

PDF182December 12, 2024