論文発表は芸術である：学術プレゼンテーションのための自己改善型審美エージェント

要旨

学術論文の普及は、研究の可視性を高める重要な手段となっている。しかし、既存の自動化手法は、ストーリーテリングの限界、美的品質の不足、自己調整の制約といった課題に直面しており、効率的で魅力的な情報発信を実現することが難しい。これらの課題の核心には、評価が正しく行われなければ改善ができないという単純な原則がある。この問題に対処するため、我々はEvoPresentを提案する。これは、一貫したナラティブ、美的意識のあるデザイン、仮想キャラクターによる現実的なプレゼンテーションを統合した自己改善エージェントフレームワークである。EvoPresentの中核となるのはPresAesthであり、これはマルチタスク強化学習（RL）に基づく美的モデルであり、信頼性のある美的スコアリング、欠陥調整、比較フィードバックを提供し、限られた美的トレーニングデータ下でも反復的な自己改善を可能にする。手法を体系的に評価するため、我々はEvoPresent Benchmarkを導入した。これは、650のトップクラスのAI会議論文に基づくマルチモーダルリソース（スライド、動画、スクリプト）を用いて内容とデザインの両方を評価する「プレゼンテーション生成品質」と、異なる美的レベルを持つ2,000組のスライドペアから構成される「美的意識」を含む包括的なベンチマークであり、スコアリング、欠陥調整、比較に関する共同トレーニングと評価をサポートする。我々の研究結果は以下の点を強調している：(i) エージェントの自己改善には高品質なフィードバックが不可欠であり、初期能力だけでは効果的な自己修正を保証できない。(ii) 自動生成パイプラインは、視覚デザインと内容構築の間にトレードオフを示す。(iii) マルチタスクRLトレーニングは、美的意識タスクにおいてより強い汎化性能を示す。

English

The promotion of academic papers has become an important means of enhancing research visibility. However, existing automated methods struggle limited storytelling, insufficient aesthetic quality, and constrained self-adjustment, making it difficult to achieve efficient and engaging dissemination. At the heart of those challenges is a simple principle: there is no way to improve it when you cannot evaluate it right. To address this, we introduce EvoPresent, a self-improvement agent framework that unifies coherent narratives, aesthetic-aware designs, and realistic presentation delivery via virtual characters. Central to EvoPresent is PresAesth, a multi-task reinforcement learning (RL) aesthetic model that provides reliable aesthetic scoring, defect adjustment, and comparative feedback, enabling iterative self-improvement even under limited aesthetic training data. To systematically evaluate the methods, we introduce EvoPresent Benchmark, a comprehensive benchmark comprising: Presentation Generation Quality, built on 650 top-tier AI conference papers with multimodal resources (slides, videos and scripts) to assess both content and design; and Aesthetic Awareness, consisting of 2,000 slide pairs with varying aesthetic levels, supporting joint training and evaluation on scoring, defect adjustment, and comparison. Our findings highlight that (i) High-quality feedback is essential for agent self-improvement, while initial capability alone does not guarantee effective self-correction. (ii) Automated generation pipelines exhibit a trade-off between visual design and content construction. (iii) Multi-task RL training shows stronger generalization in aesthetic awareness tasks.

論文発表は芸術である：学術プレゼンテーションのための自己改善型審美エージェント

Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations

要旨

Support