一文一劇：マルチエージェントシステムによるパーソナライズされた短編ドラマ生成

要旨

現行のデジタル短編ドラマ制作における手法は、通常、ワンショットでLLMが生成した脚本と疎結合なパイプラインに依存しており、短編ドラマ生成に求められる次の三つの要件を満たせていない。(1) ナラティブのテンポ——弱いフック、不十分な盛り上がり、魅力的でない結末を招く。(2) 空間的一貫性——クリップ間でシーンのレイアウトがずれ、キャラクターの位置が一貫しない。(3) 制作レベルの品質管理——脚本およびビジュアルの各段階で大規模な手動レビューと修正が必要となる。本稿では、ユーザーが一文で示したアイデアを、構造化された中間モジュールと反復的改良により完全に制作された短編ドラマへと変換する、階層型マルチエージェントフレームワーク「One Sentence, One Drama」を提案する。本手法は以下の三つの主要コンポーネントにより構成される。(1) マルチエージェントの討論に基づくストーリー生成モジュール——短編ドラマのテンポとナラティブの一貫性を実現する。(2) 3D基盤のファーストフレーム生成機構——クリップ間でキャラクターの位置とシーンレイアウトを一貫させるための共有空間参照を確立する。(3) 多段階レビューループ——脚本、ビジュアル、動画生成の各段階で包括的な誤り検出と対象を絞った修正を実行する。また、シーンレベルのBGMマッチングとシーン遷移計画を導入し、観客の没入体験を向上させる。本タスクを体系的に評価するため、標準的な動画品質指標を短編ドラマ固有の評価基準で拡張したベンチマーク「Short-Drama-Bench」を導入する。実験結果は、本手法がナラティブ品質、クリップ間の一貫性、および全体的な視聴体験において既存のパイプラインを大幅に上回ることを示している。

English

Existing approaches for digital short-drama production typically rely on one-shot LLM generated scripts and loosely coupled pipelines, which fail to satisfy three key requirements of short-drama generation: (1) narrative pacing, resulting in weak hooks, insufficient escalation, and unattractive endings; (2) spatial consistency, leading to drifting scene layouts and inconsistent character positions across clips; and (3) production-level quality control, requiring extensive manual review and correction across script and visual stages. We present One Sentence, One Drama, a hierarchical multi-agent framework that transforms a user's single-sentence idea into a fully produced short drama through structured intermediate modules and iterative refinement. Our approach is built upon three key components: (1) a multi-agent debate-based story generation module that enforces short-drama pacing and narrative coherence; (2) a 3D-grounded first-frame generation mechanism that establishes a shared spatial reference for consistent character positioning and scene layout across clips; and (3) multi-stage reviewer loops that perform comprehensive error detection and targeted revision across script, visual, and video generation stages. We also introduce scene-level BGM matching and scene transition planning to improve the audience's immersive experience. To systematically evaluate this task, we introduce Short-Drama-Bench, a benchmark that extends standard video quality metrics with short-drama-specific criteria. Experimental results demonstrate that our method significantly outperforms existing pipelines in narrative quality, cross-clip consistency, and overall viewing experience.