ChatPaper.aiChatPaper

Paper2Video:科研论文自动生成视频系统

Paper2Video: Automatic Video Generation from Scientific Papers

October 6, 2025
作者: Zeyu Zhu, Kevin Qinghong Lin, Mike Zheng Shou
cs.AI

摘要

学术演示视频已成为科研交流的重要媒介,但其制作过程依然高度耗时,通常需要数小时的幻灯片设计、录制和剪辑,才能生成一段2至10分钟的短片。与自然视频不同,演示视频的生成面临独特挑战:输入源为研究论文,包含密集的多模态信息(文本、图表、表格),且需协调多个对齐通道,如幻灯片、字幕、语音及演讲者画面。为应对这些挑战,我们推出了PaperTalker,首个包含101篇研究论文及其作者制作的演示视频、幻灯片和演讲者元数据的基准数据集。此外,我们设计了四项定制评估指标——元相似度、演示竞技场、演示测验和知识产权记忆——以衡量视频向观众传递论文信息的效果。基于此基础,我们提出了PaperTalker,首个用于学术演示视频生成的多智能体框架。该框架集成了幻灯片生成与通过新颖的树搜索视觉选择进行有效布局优化、光标定位、字幕生成、语音合成及虚拟人像渲染,同时并行化逐页生成以提高效率。在Paper2Video上的实验表明,相较于现有基线方法,我们的方法生成的演示视频更加忠实且信息丰富,为自动化且即开即用的学术视频生成迈出了实用的一步。我们的数据集、智能体及代码已发布于https://github.com/showlab/Paper2Video。
English
Academic presentation videos have become an essential medium for research communication, yet producing them remains highly labor-intensive, often requiring hours of slide design, recording, and editing for a short 2 to 10 minutes video. Unlike natural video, presentation video generation involves distinctive challenges: inputs from research papers, dense multi-modal information (text, figures, tables), and the need to coordinate multiple aligned channels such as slides, subtitles, speech, and human talker. To address these challenges, we introduce PaperTalker, the first benchmark of 101 research papers paired with author-created presentation videos, slides, and speaker metadata. We further design four tailored evaluation metrics--Meta Similarity, PresentArena, PresentQuiz, and IP Memory--to measure how videos convey the paper's information to the audience. Building on this foundation, we propose PaperTalker, the first multi-agent framework for academic presentation video generation. It integrates slide generation with effective layout refinement by a novel effective tree search visual choice, cursor grounding, subtitling, speech synthesis, and talking-head rendering, while parallelizing slide-wise generation for efficiency. Experiments on Paper2Video demonstrate that the presentation videos produced by our approach are more faithful and informative than existing baselines, establishing a practical step toward automated and ready-to-use academic video generation. Our dataset, agent, and code are available at https://github.com/showlab/Paper2Video.
PDF842October 7, 2025