海報LLaVa：使用LLM構建統一的多模式布局生成器

摘要

版面生成是實現自動化平面設計的關鍵，需要以視覺上令人愉悅且遵循約束的方式安排各種多模式設計元素的位置和大小。先前的方法要麼對於大規模應用效率低下，要麼缺乏對不同設計需求的靈活性。我們的研究引入了一個統一的框架，用於自動化平面版面生成，利用多模式大語言模型（MLLM）來滿足各種設計任務。相比之下，我們的數據驅動方法採用結構化文本（JSON格式）和視覺指令調整，以在特定視覺和文本約束下生成版面，包括用戶定義的自然語言規範。我們進行了廣泛的實驗，在公共多模式版面生成基準測試中取得了最先進的表現，展示了我們方法的有效性。此外，鑑於現有數據集在捕捉現實世界平面設計複雜性方面的局限性，我們提出了兩個新的數據集，用於更具挑戰性的任務（用戶約束生成和複雜的海報），進一步驗證了我們模型在實際應用中的效用。這種方法以其卓越的可訪問性和適應性，進一步自動化了大規模平面設計任務。代碼和數據集將在https://github.com/posterllava/PosterLLaVA 上公開提供。

English

Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the effectiveness of our method. Moreover, recognizing existing datasets' limitations in capturing the complexity of real-world graphic designs, we propose two new datasets for much more challenging tasks (user-constrained generation and complicated poster), further validating our model's utility in real-life settings. Marking by its superior accessibility and adaptability, this approach further automates large-scale graphic design tasks. The code and datasets will be publicly available on https://github.com/posterllava/PosterLLaVA.

海報LLaVa：使用LLM構建統一的多模式布局生成器

PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

摘要

Support