AniMaker：基于蒙特卡洛树搜索驱动的多智能体自动动画故事生成系统

摘要

尽管视频生成模型发展迅速，但生成跨越多个场景和角色的连贯叙事视频仍具挑战性。现有方法通常将预生成的关键帧生硬地转换为固定长度的片段，导致叙事脱节和节奏问题。此外，视频生成模型固有的不稳定性意味着，即使单个低质量片段也可能显著降低整个输出动画的逻辑连贯性和视觉连续性。为克服这些障碍，我们推出了AniMaker，一个多智能体框架，支持高效的多候选片段生成和叙事感知片段选择，从而仅从文本输入创建全局一致且故事连贯的动画。该框架围绕专门设计的智能体构建，包括负责故事板生成的导演智能体、负责视频片段生成的摄影智能体、负责评估的评审智能体，以及负责剪辑和配音的后期制作智能体。AniMaker方法的核心在于两个关键技术组件：摄影智能体中的MCTS-Gen，一种受蒙特卡洛树搜索（MCTS）启发的策略，智能地导航候选空间以生成高潜力片段，同时优化资源使用；以及评审智能体中的AniEval，首个专为多镜头动画评估设计的框架，通过考虑每个片段与其前后片段的关系，评估故事层面的连贯性、动作完成度及动画特有特征等关键方面。实验表明，AniMaker在VBench和我们提出的AniEval框架等流行指标上均展现出卓越质量，同时显著提升了多候选生成的效率，推动AI生成的叙事动画更接近生产标准。

English

Despite rapid advancements in video generation models, generating coherent storytelling videos that span multiple scenes and characters remains challenging. Current methods often rigidly convert pre-generated keyframes into fixed-length clips, resulting in disjointed narratives and pacing issues. Furthermore, the inherent instability of video generation models means that even a single low-quality clip can significantly degrade the entire output animation's logical coherence and visual continuity. To overcome these obstacles, we introduce AniMaker, a multi-agent framework enabling efficient multi-candidate clip generation and storytelling-aware clip selection, thus creating globally consistent and story-coherent animation solely from text input. The framework is structured around specialized agents, including the Director Agent for storyboard generation, the Photography Agent for video clip generation, the Reviewer Agent for evaluation, and the Post-Production Agent for editing and voiceover. Central to AniMaker's approach are two key technical components: MCTS-Gen in Photography Agent, an efficient Monte Carlo Tree Search (MCTS)-inspired strategy that intelligently navigates the candidate space to generate high-potential clips while optimizing resource usage; and AniEval in Reviewer Agent, the first framework specifically designed for multi-shot animation evaluation, which assesses critical aspects such as story-level consistency, action completion, and animation-specific features by considering each clip in the context of its preceding and succeeding clips. Experiments demonstrate that AniMaker achieves superior quality as measured by popular metrics including VBench and our proposed AniEval framework, while significantly improving the efficiency of multi-candidate generation, pushing AI-generated storytelling animation closer to production standards.

AniMaker：基于蒙特卡洛树搜索驱动的多智能体自动动画故事生成系统

AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

摘要

Support