论文呈现是一门艺术:学术报告中的自我提升美学策略
Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
October 7, 2025
作者: Chengzhi Liu, Yuzhe Yang, Kaiwen Zhou, Zhen Zhang, Yue Fan, Yannan Xie, Peng Qi, Xin Eric Wang
cs.AI
摘要
学术论文的推广已成为提升研究可见度的重要手段。然而,现有的自动化方法在叙事连贯性、美学质量不足及自我调整受限等方面面临挑战,难以实现高效且引人入胜的传播。这些挑战的核心在于一个简单原则:无法正确评估便无从改进。为此,我们提出了EvoPresent,一个自我改进的代理框架,通过虚拟角色统一了连贯的叙事、美学感知的设计及逼真的演示呈现。EvoPresent的核心是PresAesth,一个多任务强化学习(RL)美学模型,它提供了可靠的美学评分、缺陷调整及比较反馈,即使在有限的美学训练数据下也能实现迭代自我改进。为了系统评估这些方法,我们引入了EvoPresent基准,这是一个综合基准,包括:基于650篇顶级AI会议论文的多模态资源(幻灯片、视频和脚本)构建的演示生成质量评估,以及由2000对美学水平各异的幻灯片组成的美学意识评估,支持在评分、缺陷调整和比较任务上的联合训练与评估。我们的研究结果表明:(i)高质量的反馈对于代理自我改进至关重要,而初始能力本身并不保证有效的自我纠正。(ii)自动化生成管道在视觉设计与内容构建之间存在权衡。(iii)多任务RL训练在美学意识任务中展现出更强的泛化能力。
English
The promotion of academic papers has become an important means of enhancing
research visibility. However, existing automated methods struggle limited
storytelling, insufficient aesthetic quality, and constrained self-adjustment,
making it difficult to achieve efficient and engaging dissemination. At the
heart of those challenges is a simple principle: there is no way to
improve it when you cannot evaluate it right. To address this, we introduce
EvoPresent, a self-improvement agent framework that unifies coherent
narratives, aesthetic-aware designs, and realistic presentation delivery via
virtual characters. Central to EvoPresent is PresAesth, a multi-task
reinforcement learning (RL) aesthetic model that provides reliable aesthetic
scoring, defect adjustment, and comparative feedback, enabling iterative
self-improvement even under limited aesthetic training data. To systematically
evaluate the methods, we introduce EvoPresent Benchmark, a
comprehensive benchmark comprising: Presentation Generation Quality,
built on 650 top-tier AI conference papers with multimodal resources (slides,
videos and scripts) to assess both content and design; and Aesthetic
Awareness, consisting of 2,000 slide pairs with varying aesthetic levels,
supporting joint training and evaluation on scoring, defect adjustment, and
comparison. Our findings highlight that (i) High-quality feedback is essential
for agent self-improvement, while initial capability alone does not guarantee
effective self-correction. (ii) Automated generation pipelines exhibit a
trade-off between visual design and content construction. (iii) Multi-task RL
training shows stronger generalization in aesthetic awareness tasks.