AniMaker：基於蒙特卡洛樹搜索驅動片段生成的自動化多代理動畫敘事系統

摘要

儘管視頻生成模型迅速進步，生成跨越多場景與角色的連貫敘事視頻仍具挑戰。現有方法常將預先生成的關鍵幀機械轉換為固定長度片段，導致敘事斷裂與節奏問題。此外，視頻生成模型固有的不穩定性意味著，即便單一低質量片段也可能嚴重損害整個輸出動畫的邏輯連貫性與視覺連續性。為克服這些障礙，我們引入了AniMaker，這是一個多代理框架，支持高效的多候選片段生成及敘事感知的片段選擇，從而僅從文本輸入創建全局一致且故事連貫的動畫。該框架圍繞專職代理構建，包括負責故事板生成的導演代理、負責視頻片段生成的攝影代理、負責評估的審核代理，以及負責剪輯與配音的後期製作代理。AniMaker方法的兩大核心技術組件是：攝影代理中的MCTS-Gen，一種受蒙特卡羅樹搜索（MCTS）啟發的高效策略，智能探索候選空間以生成高潛力片段，同時優化資源使用；以及審核代理中的AniEval，首個專為多鏡頭動畫評估設計的框架，通過考慮每個片段在其前後片段上下文中的表現，評估故事層面的一致性、動作完成度及動畫特有特徵。實驗表明，AniMaker在VBench及我們提出的AniEval框架等流行指標下展現出卓越質量，同時顯著提升了多候選生成的效率，推動AI生成的敘事動畫更接近生產標準。

English

Despite rapid advancements in video generation models, generating coherent storytelling videos that span multiple scenes and characters remains challenging. Current methods often rigidly convert pre-generated keyframes into fixed-length clips, resulting in disjointed narratives and pacing issues. Furthermore, the inherent instability of video generation models means that even a single low-quality clip can significantly degrade the entire output animation's logical coherence and visual continuity. To overcome these obstacles, we introduce AniMaker, a multi-agent framework enabling efficient multi-candidate clip generation and storytelling-aware clip selection, thus creating globally consistent and story-coherent animation solely from text input. The framework is structured around specialized agents, including the Director Agent for storyboard generation, the Photography Agent for video clip generation, the Reviewer Agent for evaluation, and the Post-Production Agent for editing and voiceover. Central to AniMaker's approach are two key technical components: MCTS-Gen in Photography Agent, an efficient Monte Carlo Tree Search (MCTS)-inspired strategy that intelligently navigates the candidate space to generate high-potential clips while optimizing resource usage; and AniEval in Reviewer Agent, the first framework specifically designed for multi-shot animation evaluation, which assesses critical aspects such as story-level consistency, action completion, and animation-specific features by considering each clip in the context of its preceding and succeeding clips. Experiments demonstrate that AniMaker achieves superior quality as measured by popular metrics including VBench and our proposed AniEval framework, while significantly improving the efficiency of multi-candidate generation, pushing AI-generated storytelling animation closer to production standards.