AniMaker:基于蒙特卡洛树搜索驱动的多智能体自动动画故事生成系统
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
June 12, 2025
作者: Haoyuan Shi, Yunxin Li, Xinyu Chen, Longyue Wang, Baotian Hu, Min Zhang
cs.AI
摘要
尽管视频生成模型发展迅速,但生成跨越多个场景和角色的连贯叙事视频仍具挑战性。现有方法通常将预生成的关键帧生硬地转换为固定长度的片段,导致叙事脱节和节奏问题。此外,视频生成模型固有的不稳定性意味着,即使单个低质量片段也可能显著降低整个输出动画的逻辑连贯性和视觉连续性。为克服这些障碍,我们推出了AniMaker,一个多智能体框架,支持高效的多候选片段生成和叙事感知片段选择,从而仅从文本输入创建全局一致且故事连贯的动画。该框架围绕专门设计的智能体构建,包括负责故事板生成的导演智能体、负责视频片段生成的摄影智能体、负责评估的评审智能体,以及负责剪辑和配音的后期制作智能体。AniMaker方法的核心在于两个关键技术组件:摄影智能体中的MCTS-Gen,一种受蒙特卡洛树搜索(MCTS)启发的策略,智能地导航候选空间以生成高潜力片段,同时优化资源使用;以及评审智能体中的AniEval,首个专为多镜头动画评估设计的框架,通过考虑每个片段与其前后片段的关系,评估故事层面的连贯性、动作完成度及动画特有特征等关键方面。实验表明,AniMaker在VBench和我们提出的AniEval框架等流行指标上均展现出卓越质量,同时显著提升了多候选生成的效率,推动AI生成的叙事动画更接近生产标准。
English
Despite rapid advancements in video generation models, generating coherent
storytelling videos that span multiple scenes and characters remains
challenging. Current methods often rigidly convert pre-generated keyframes into
fixed-length clips, resulting in disjointed narratives and pacing issues.
Furthermore, the inherent instability of video generation models means that
even a single low-quality clip can significantly degrade the entire output
animation's logical coherence and visual continuity. To overcome these
obstacles, we introduce AniMaker, a multi-agent framework enabling efficient
multi-candidate clip generation and storytelling-aware clip selection, thus
creating globally consistent and story-coherent animation solely from text
input. The framework is structured around specialized agents, including the
Director Agent for storyboard generation, the Photography Agent for video clip
generation, the Reviewer Agent for evaluation, and the Post-Production Agent
for editing and voiceover. Central to AniMaker's approach are two key technical
components: MCTS-Gen in Photography Agent, an efficient Monte Carlo Tree Search
(MCTS)-inspired strategy that intelligently navigates the candidate space to
generate high-potential clips while optimizing resource usage; and AniEval in
Reviewer Agent, the first framework specifically designed for multi-shot
animation evaluation, which assesses critical aspects such as story-level
consistency, action completion, and animation-specific features by considering
each clip in the context of its preceding and succeeding clips. Experiments
demonstrate that AniMaker achieves superior quality as measured by popular
metrics including VBench and our proposed AniEval framework, while
significantly improving the efficiency of multi-candidate generation, pushing
AI-generated storytelling animation closer to production standards.