论文呈现是一门艺术：学术报告中的自我提升美学策略

摘要

学术论文的推广已成为提升研究可见度的重要手段。然而，现有的自动化方法在叙事连贯性、美学质量不足及自我调整受限等方面面临挑战，难以实现高效且引人入胜的传播。这些挑战的核心在于一个简单原则：无法正确评估便无从改进。为此，我们提出了EvoPresent，一个自我改进的代理框架，通过虚拟角色统一了连贯的叙事、美学感知的设计及逼真的演示呈现。EvoPresent的核心是PresAesth，一个多任务强化学习（RL）美学模型，它提供了可靠的美学评分、缺陷调整及比较反馈，即使在有限的美学训练数据下也能实现迭代自我改进。为了系统评估这些方法，我们引入了EvoPresent基准，这是一个综合基准，包括：基于650篇顶级AI会议论文的多模态资源（幻灯片、视频和脚本）构建的演示生成质量评估，以及由2000对美学水平各异的幻灯片组成的美学意识评估，支持在评分、缺陷调整和比较任务上的联合训练与评估。我们的研究结果表明：（i）高质量的反馈对于代理自我改进至关重要，而初始能力本身并不保证有效的自我纠正。（ii）自动化生成管道在视觉设计与内容构建之间存在权衡。（iii）多任务RL训练在美学意识任务中展现出更强的泛化能力。

English

The promotion of academic papers has become an important means of enhancing research visibility. However, existing automated methods struggle limited storytelling, insufficient aesthetic quality, and constrained self-adjustment, making it difficult to achieve efficient and engaging dissemination. At the heart of those challenges is a simple principle: there is no way to improve it when you cannot evaluate it right. To address this, we introduce EvoPresent, a self-improvement agent framework that unifies coherent narratives, aesthetic-aware designs, and realistic presentation delivery via virtual characters. Central to EvoPresent is PresAesth, a multi-task reinforcement learning (RL) aesthetic model that provides reliable aesthetic scoring, defect adjustment, and comparative feedback, enabling iterative self-improvement even under limited aesthetic training data. To systematically evaluate the methods, we introduce EvoPresent Benchmark, a comprehensive benchmark comprising: Presentation Generation Quality, built on 650 top-tier AI conference papers with multimodal resources (slides, videos and scripts) to assess both content and design; and Aesthetic Awareness, consisting of 2,000 slide pairs with varying aesthetic levels, supporting joint training and evaluation on scoring, defect adjustment, and comparison. Our findings highlight that (i) High-quality feedback is essential for agent self-improvement, while initial capability alone does not guarantee effective self-correction. (ii) Automated generation pipelines exhibit a trade-off between visual design and content construction. (iii) Multi-task RL training shows stronger generalization in aesthetic awareness tasks.