PresentBench：スライド生成のための詳細ルーブリックベースベンチマーク

要旨

スライドは、学術、教育、ビジネスなどのプレゼンテーション中心の場において、情報を伝達する重要な媒体である。その重要性にもかかわらず、質の高いスライドデッキの作成は、時間がかかり、認知的負荷も大きい。Nano Banana Proのような生成モデルの最近の進歩により、自動スライド生成はますます現実的になってきている。しかし、既存のスライド生成の評価は、大まかで全体的な判断に依存することが多く、モデルの能力を正確に評価したり、分野における意味のある進歩を追跡したりすることを困難にしている。実際には、細粒度で検証可能な評価基準の欠如が、研究と実世界での展開の両方における重大なボトルネックとなっている。本論文では、自動的な実世界のスライド生成を評価するための、細粒度でルーブリックベースのベンチマークであるPresentBenchを提案する。PresentBenchは238の評価インスタンスを含み、各インスタンスにはスライド作成に必要な背景資料が補足されている。さらに、生成されたスライドデッキを細粒度かつインスタンス特化的に評価するため、インスタンスごとに平均54.1個のチェックリスト項目を手動で設計し、それぞれを二値質問として定式化した。大規模な実験により、PresentBenchが既存の手法よりも信頼性の高い評価結果を提供し、人間の選好との整合性が有意に強いことが示された。さらに、本ベンチマークは、NotebookLMが他のスライド生成手法を大きく上回ることを明らかにし、この領域における最近の顕著な進歩を浮き彫りにしている。

English

Slides serve as a critical medium for conveying information in presentation-oriented scenarios such as academia, education, and business. Despite their importance, creating high-quality slide decks remains time-consuming and cognitively demanding. Recent advances in generative models, such as Nano Banana Pro, have made automated slide generation increasingly feasible. However, existing evaluations of slide generation are often coarse-grained and rely on holistic judgments, making it difficult to accurately assess model capabilities or track meaningful advances in the field. In practice, the lack of fine-grained, verifiable evaluation criteria poses a critical bottleneck for both research and real-world deployment. In this paper, we propose PresentBench, a fine-grained, rubric-based benchmark for evaluating automated real-world slide generation. It contains 238 evaluation instances, each supplemented with background materials required for slide creation. Moreover, we manually design an average of 54.1 checklist items per instance, each formulated as a binary question, to enable fine-grained, instance-specific evaluation of the generated slide decks. Extensive experiments show that PresentBench provides more reliable evaluation results than existing methods, and exhibits significantly stronger alignment with human preferences. Furthermore, our benchmark reveals that NotebookLM significantly outperforms other slide generation methods, highlighting substantial recent progress in this domain.

PresentBench：スライド生成のための詳細ルーブリックベースベンチマーク

PresentBench: A Fine-Grained Rubric-Based Benchmark for Slide Generation

要旨

Support