PresentBench:基于精细量规的幻灯片生成基准测试框架
PresentBench: A Fine-Grained Rubric-Based Benchmark for Slide Generation
March 7, 2026
作者: Xin-Sheng Chen, Jiayu Zhu, Pei-lin Li, Hanzheng Wang, Shuojin Yang, Meng-Hao Guo
cs.AI
摘要
在學術、教育及商業等以演示為導向的場景中,幻燈片是傳遞資訊的關鍵媒介。儘管其重要性不言而喻,但製作高質量的幻燈片組仍耗時費力且對認知能力要求極高。隨著Nano Banana Pro等生成模型的最新進展,自動化幻燈片生成已日益可行。然而,現有的幻燈片生成評估往往過於粗粒度且依賴整體性判斷,難以準確衡量模型能力或追蹤該領域的實質進展。實踐中,缺乏細粒度、可驗證的評估標準已成為研究與實際應用的關鍵瓶頸。本文提出PresentBench——一個基於細粒度評分量表的基準測試框架,用於評估自動化實境幻燈片生成。該框架包含238個評估實例,每個實例均附帶幻燈片製作所需的背景材料。此外,我們為每個實例人工設計了平均54.1個檢查項(以二元問題形式呈現),實現對生成幻燈片組的細粒度實例化評估。大量實驗表明,PresentBench相比現有方法能提供更可靠的評估結果,且與人類偏好呈現顯著更強的關聯性。進一步地,我們的基準測試揭示NotebookLM顯著優於其他幻燈片生成方法,印證了該領域近期取得的重大進展。
English
Slides serve as a critical medium for conveying information in presentation-oriented scenarios such as academia, education, and business. Despite their importance, creating high-quality slide decks remains time-consuming and cognitively demanding. Recent advances in generative models, such as Nano Banana Pro, have made automated slide generation increasingly feasible. However, existing evaluations of slide generation are often coarse-grained and rely on holistic judgments, making it difficult to accurately assess model capabilities or track meaningful advances in the field. In practice, the lack of fine-grained, verifiable evaluation criteria poses a critical bottleneck for both research and real-world deployment. In this paper, we propose PresentBench, a fine-grained, rubric-based benchmark for evaluating automated real-world slide generation. It contains 238 evaluation instances, each supplemented with background materials required for slide creation. Moreover, we manually design an average of 54.1 checklist items per instance, each formulated as a binary question, to enable fine-grained, instance-specific evaluation of the generated slide decks. Extensive experiments show that PresentBench provides more reliable evaluation results than existing methods, and exhibits significantly stronger alignment with human preferences. Furthermore, our benchmark reveals that NotebookLM significantly outperforms other slide generation methods, highlighting substantial recent progress in this domain.