Paper2Poster:迈向从科学论文自动生成多模态海报
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
May 27, 2025
作者: Wei Pang, Kevin Qinghong Lin, Xiangru Jian, Xi He, Philip Torr
cs.AI
摘要
學術海報生成是科學傳播中一項關鍵且具挑戰性的任務,需要將長篇交織的文檔壓縮成單一、視覺連貫的頁面。為應對這一挑戰,我們首次引入了海報生成的基準測試與評估指標套件,該套件將近期會議論文與作者設計的海報配對,並從以下方面評估輸出:(i)視覺質量——與人類設計海報的語義對齊,(ii)文本連貫性——語言流暢度,(iii)整體評估——由視覺語言模型(VLM)作為評判者對六項細緻的美學與信息標準進行評分,以及尤為重要的(iv)論文測驗——海報傳達論文核心內容的能力,通過VLM回答生成測驗來衡量。基於此基準,我們提出了PosterAgent,一種自上而下、視覺在環的多智能體流程:(a)解析器將論文提煉成結構化的資源庫;(b)規劃器將文本-視覺對齊為二元樹佈局,保持閱讀順序與空間平衡;(c)繪製-評論循環通過執行渲染代碼並利用VLM反饋來消除溢出並確保對齊,從而精修每個面板。在全面評估中,我們發現GPT-4o的輸出雖然初看視覺吸引人,但常伴有雜亂文本與低論文測驗分數,且讀者參與度是主要的美學瓶頸,因為人類設計的海報主要依賴視覺語義來傳達意義。我們完全開源的變體(如基於Qwen-2.5系列)在幾乎所有指標上均優於現有的4o驅動多智能體系統,同時減少了87%的token使用量。它將22頁的論文轉化為最終可編輯的.pptx海報——僅需0.005美元。這些發現為下一代全自動海報生成模型指明了清晰方向。代碼與數據集可在https://github.com/Paper2Poster/Paper2Poster獲取。
English
Academic poster generation is a crucial yet challenging task in scientific
communication, requiring the compression of long-context interleaved documents
into a single, visually coherent page. To address this challenge, we introduce
the first benchmark and metric suite for poster generation, which pairs recent
conference papers with author-designed posters and evaluates outputs on
(i)Visual Quality-semantic alignment with human posters, (ii)Textual
Coherence-language fluency, (iii)Holistic Assessment-six fine-grained aesthetic
and informational criteria scored by a VLM-as-judge, and notably
(iv)PaperQuiz-the poster's ability to convey core paper content as measured by
VLMs answering generated quizzes. Building on this benchmark, we propose
PosterAgent, a top-down, visual-in-the-loop multi-agent pipeline: the (a)Parser
distills the paper into a structured asset library; the (b)Planner aligns
text-visual pairs into a binary-tree layout that preserves reading order and
spatial balance; and the (c)Painter-Commenter loop refines each panel by
executing rendering code and using VLM feedback to eliminate overflow and
ensure alignment. In our comprehensive evaluation, we find that GPT-4o
outputs-though visually appealing at first glance-often exhibit noisy text and
poor PaperQuiz scores, and we find that reader engagement is the primary
aesthetic bottleneck, as human-designed posters rely largely on visual
semantics to convey meaning. Our fully open-source variants (e.g. based on the
Qwen-2.5 series) outperform existing 4o-driven multi-agent systems across
nearly all metrics, while using 87% fewer tokens. It transforms a 22-page paper
into a finalized yet editable .pptx poster - all for just $0.005. These
findings chart clear directions for the next generation of fully automated
poster-generation models. The code and datasets are available at
https://github.com/Paper2Poster/Paper2Poster.Summary
AI-Generated Summary