联合总监:能动生成式视频叙事
Co-Director: Agentic Generative Video Storytelling
April 27, 2026
作者: Yale Song, Yiwen Song, Nick Losier, Nathan Hodson, Ye Jin, Rhyard Zhu, Yan Xu, Daniel Vlasic, Carina Claassen, Jasmine Leon, Khanh G. LeViet, Zack Chomyn, Joe Timmons, Brett Slatkin, Scott Penberthy, Tomas Pfister
cs.AI
摘要
尽管扩散模型能够生成高保真度的视频片段,但将其转化为连贯的叙事引擎仍面临挑战。现有智能体流水线通过链式模块实现自动化生成,但因依赖独立的手工提示而存在语义漂移和级联失效问题。我们提出Co-Director——一种将视频叙事形式化为全局优化问题的分层多智能体框架。为确保语义连贯性,我们引入分层参数化方法:通过多臂赌博机全局识别潜力创意方向,同时采用局部多模态自优化循环来缓解角色漂移并保障序列级一致性。该框架实现了新颖叙事策略探索与有效创意配置利用之间的平衡。为进行评估,我们构建了GenAD-Bench数据集,包含400个虚构产品的个性化广告场景。实验表明,Co-Director显著优于现有先进基线,其原理化方法可无缝推广至更广泛的影视叙事领域。项目页面:https://co-director-agent.github.io/
English
While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging. Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting. We present Co-Director, a hierarchical multi-agent framework formalizing video storytelling as a global optimization problem. To ensure semantic coherence, we introduce hierarchical parameterization: a multi-armed bandit globally identifies promising creative directions, while a local multimodal self-refinement loop mitigates identity drift and ensures sequence-level consistency. This balances the exploration of novel narrative strategies with the exploitation of effective creative configurations. For evaluation, we introduce GenAD-Bench, a 400-scenario dataset of fictional products for personalized advertising. Experiments demonstrate that Co-Director significantly outperforms state-of-the-art baselines, offering a principled approach that seamlessly generalizes to broader cinematic narratives. Project Page: https://co-director-agent.github.io/