ChatPaper.aiChatPaper

剧本即一切:面向长程对话到电影视频生成的智能体框架

The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation

January 25, 2026
作者: Chenyu Mu, Xin He, Qu Yang, Wanshun Chen, Jiadi Yao, Huang Liu, Zihao Yi, Bo Zhao, Xingyu Chen, Ruotian Ma, Fanghua Ye, Erkun Yang, Cheng Deng, Zhaopeng Tu, Xiaolong Li, Linus
cs.AI

摘要

近期视频生成技术的突破性进展已能通过简单文本提示合成出令人惊叹的视觉内容。然而,这些模型在根据对话等高层概念生成长篇连贯叙事时仍显乏力,暴露出创意构想与影视化呈现之间的"语义鸿沟"。为弥合这一鸿沟,我们提出了一种新颖的端到端智能体框架,实现从对话到电影级视频的生成。该框架的核心是剧本生成智能体(ScripterAgent),该模型经训练可将粗略对话转化为细粒度、可执行的电影脚本。为此我们构建了ScriptBench——一个通过专家指导流程标注、具有丰富多模态语境的大型基准数据集。生成的脚本随后指导导演智能体(DirectorAgent),该组件采用跨场景连续生成策略协调最先进的视频模型,确保长时序叙事连贯性。我们通过AI驱动的评审智能体(CriticAgent)和新型视觉-脚本对齐(VSA)指标进行综合评估,结果表明该框架显著提升了所有测试视频模型的脚本忠实度与时序保真度。此外,分析揭示了当前顶尖模型在视觉奇观与严格脚本遵循之间存在关键权衡,为自动化电影制作的未来发展提供了重要启示。
English
Recent advances in video generation have produced models capable of synthesizing stunning visual content from simple text prompts. However, these models struggle to generate long-form, coherent narratives from high-level concepts like dialogue, revealing a ``semantic gap'' between a creative idea and its cinematic execution. To bridge this gap, we introduce a novel, end-to-end agentic framework for dialogue-to-cinematic-video generation. Central to our framework is ScripterAgent, a model trained to translate coarse dialogue into a fine-grained, executable cinematic script. To enable this, we construct ScriptBench, a new large-scale benchmark with rich multimodal context, annotated via an expert-guided pipeline. The generated script then guides DirectorAgent, which orchestrates state-of-the-art video models using a cross-scene continuous generation strategy to ensure long-horizon coherence. Our comprehensive evaluation, featuring an AI-powered CriticAgent and a new Visual-Script Alignment (VSA) metric, shows our framework significantly improves script faithfulness and temporal fidelity across all tested video models. Furthermore, our analysis uncovers a crucial trade-off in current SOTA models between visual spectacle and strict script adherence, providing valuable insights for the future of automated filmmaking.
PDF463January 28, 2026