PosterGen:基於多智能體大語言模型的審美感知論文轉海報生成系統
PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs
August 24, 2025
作者: Zhilin Zhang, Xiang Zhang, Jiaqi Wei, Yiwei Xu, Chenyu You
cs.AI
摘要
基於大型語言模型(LLMs)的多智能體系統在處理複雜組合任務方面展現了顯著的能力。在本研究中,我們將這一範式應用於論文轉海報生成問題,這是研究人員在準備會議時面臨的一項實用但耗時的過程。儘管近期的方法嘗試自動化此任務,但大多數忽略了核心設計和美學原則,導致生成的海報需要大量手動調整。為解決這些設計限制,我們提出了PosterGen,一個模擬專業海報設計師工作流程的多智能體框架。它由四個協作的特化智能體組成:(1) 解析器與策展智能體從論文中提取內容並組織故事板;(2) 佈局智能體將內容映射到連貫的空間佈局中;(3) 風格設計智能體應用如色彩和字體等視覺設計元素;以及(4) 渲染器合成最終海報。這些智能體共同產出既語義紮實又視覺吸引人的海報。為評估設計質量,我們引入了一種基於視覺-語言模型(VLM)的評分標準,衡量佈局平衡、可讀性和美學一致性。實驗結果顯示,PosterGen在內容保真度上始終匹配,並在視覺設計上顯著超越現有方法,生成的海報幾乎無需人工調整即可用於展示。
English
Multi-agent systems built upon large language models (LLMs) have demonstrated
remarkable capabilities in tackling complex compositional tasks. In this work,
we apply this paradigm to the paper-to-poster generation problem, a practical
yet time-consuming process faced by researchers preparing for conferences.
While recent approaches have attempted to automate this task, most neglect core
design and aesthetic principles, resulting in posters that require substantial
manual refinement. To address these design limitations, we propose PosterGen, a
multi-agent framework that mirrors the workflow of professional poster
designers. It consists of four collaborative specialized agents: (1) Parser and
Curator agents extract content from the paper and organize storyboard; (2)
Layout agent maps the content into a coherent spatial layout; (3) Stylist
agents apply visual design elements such as color and typography; and (4)
Renderer composes the final poster. Together, these agents produce posters that
are both semantically grounded and visually appealing. To evaluate design
quality, we introduce a vision-language model (VLM)-based rubric that measures
layout balance, readability, and aesthetic coherence. Experimental results show
that PosterGen consistently matches in content fidelity, and significantly
outperforms existing methods in visual designs, generating posters that are
presentation-ready with minimal human refinements.