一句話，一齣劇：通過多智能體系統實現個人化短劇生成

摘要

現有的數位短劇製作方法通常依賴單次大語言模型生成的腳本與鬆散耦合的流程，無法滿足短劇生成的三項關鍵需求：（1）敘事節奏——導致鉤子薄弱、情節升級不足、結尾缺乏吸引力；（2）空間一致性——造成場景佈局漂移，以及不同片段的角色位置不一致；（3）製作品質控管——需要在腳本與視覺階段進行大量人工審查與修正。我們提出「一句一劇」（One Sentence, One Drama），這是一個分層多智能體框架，能將用戶的單句構想透過結構化的中間模組與迭代優化，轉化為完整的短劇。我們的方法建立在三個核心組件之上：（1）基於多智能體辯論的故事生成模組，用以強化短劇的節奏與敘事連貫性；（2）基於三維空間的首幀生成機制，建立統一的空間參考，確保跨片段的角色位置與場景佈局一致；（3）多階段審查循環，在腳本、視覺與影片生成的各個階段進行全面的錯誤檢測與有針對性的修正。我們還引入場景級背景音樂匹配與場景轉換規劃，以提升觀眾的沉浸體驗。為了系統性評估此任務，我們提出短劇基準（Short-Drama-Bench），該基準在標準影片品質指標之外，加入了短劇特有的評估標準。實驗結果顯示，我們的方法在敘事品質、跨片段一致性與整體觀看體驗上，顯著優於現有流程。

English

Existing approaches for digital short-drama production typically rely on one-shot LLM generated scripts and loosely coupled pipelines, which fail to satisfy three key requirements of short-drama generation: (1) narrative pacing, resulting in weak hooks, insufficient escalation, and unattractive endings; (2) spatial consistency, leading to drifting scene layouts and inconsistent character positions across clips; and (3) production-level quality control, requiring extensive manual review and correction across script and visual stages. We present One Sentence, One Drama, a hierarchical multi-agent framework that transforms a user's single-sentence idea into a fully produced short drama through structured intermediate modules and iterative refinement. Our approach is built upon three key components: (1) a multi-agent debate-based story generation module that enforces short-drama pacing and narrative coherence; (2) a 3D-grounded first-frame generation mechanism that establishes a shared spatial reference for consistent character positioning and scene layout across clips; and (3) multi-stage reviewer loops that perform comprehensive error detection and targeted revision across script, visual, and video generation stages. We also introduce scene-level BGM matching and scene transition planning to improve the audience's immersive experience. To systematically evaluate this task, we introduce Short-Drama-Bench, a benchmark that extends standard video quality metrics with short-drama-specific criteria. Experimental results demonstrate that our method significantly outperforms existing pipelines in narrative quality, cross-clip consistency, and overall viewing experience.