한 문장, 한 드라마: 멀티 에이전트 시스템을 통한 개인화된 단편 드라마 생성

초록

기존의 디지털 숏드라마 제작 접근법은 일반적으로 일회성 LLM 생성 대본과 느슨하게 결합된 파이프라인에 의존하며, 이는 숏드라마 생성의 세 가지 핵심 요구사항을 충족하지 못한다: (1) 서사적 템포 측면에서 약한 훅, 불충분한 전개, 매력적이지 않은 결말; (2) 공간적 일관성 측면에서 클립 간 장면 배치의 변동과 캐릭터 위치의 불일치; (3) 제작 수준의 품질 관리 측면에서 대본과 시각적 단계에 걸친 광범위한 수동 검토 및 수정 필요. 본 논문에서는 사용자의 한 문장 아이디어를 구조화된 중간 모듈과 반복적 정제를 통해 완전히 제작된 숏드라마로 변환하는 계층적 다중 에이전트 프레임워크인 '한 문장, 한 드라마(One Sentence, One Drama)'를 제시한다. 본 접근법은 세 가지 핵심 구성 요소에 기반한다: (1) 숏드라마의 템포와 서사적 일관성을 강제하는 다중 에이전트 논쟁 기반 스토리 생성 모듈; (2) 클립 간 일관된 캐릭터 위치와 장면 배치를 위한 공유 공간 참조를 설정하는 3D 기반 첫 프레임 생성 메커니즘; (3) 대본, 시각, 비디오 생성 단계 전반에 걸쳐 포괄적 오류 탐지 및 목표 지향적 수정을 수행하는 다단계 검토 루프. 또한 관객의 몰입 경험을 향상시키기 위해 씬 수준의 배경 음악 매칭 및 씬 전환 계획을 도입한다. 이 작업을 체계적으로 평가하기 위해, 표준 비디오 품질 지표를 숏드라마 특화 기준으로 확장한 벤치마크인 Short-Drama-Bench를 소개한다. 실험 결과는 본 방법이 서사 품질, 클립 간 일관성, 전반적인 시청 경험에 있어 기존 파이프라인을 크게 능가함을 보여준다.

English

Existing approaches for digital short-drama production typically rely on one-shot LLM generated scripts and loosely coupled pipelines, which fail to satisfy three key requirements of short-drama generation: (1) narrative pacing, resulting in weak hooks, insufficient escalation, and unattractive endings; (2) spatial consistency, leading to drifting scene layouts and inconsistent character positions across clips; and (3) production-level quality control, requiring extensive manual review and correction across script and visual stages. We present One Sentence, One Drama, a hierarchical multi-agent framework that transforms a user's single-sentence idea into a fully produced short drama through structured intermediate modules and iterative refinement. Our approach is built upon three key components: (1) a multi-agent debate-based story generation module that enforces short-drama pacing and narrative coherence; (2) a 3D-grounded first-frame generation mechanism that establishes a shared spatial reference for consistent character positioning and scene layout across clips; and (3) multi-stage reviewer loops that perform comprehensive error detection and targeted revision across script, visual, and video generation stages. We also introduce scene-level BGM matching and scene transition planning to improve the audience's immersive experience. To systematically evaluate this task, we introduce Short-Drama-Bench, a benchmark that extends standard video quality metrics with short-drama-specific criteria. Experimental results demonstrate that our method significantly outperforms existing pipelines in narrative quality, cross-clip consistency, and overall viewing experience.