SmartDirector: 서사적 전개 속도 제어를 통한 키프레임 조건부 시네마틱 비디오 생성

초록

비디오의 서사 품질은 근본적으로 그 지각적 가치를 결정한다. 기존의 비디오 생성 방법은 시각적으로 매력적인 콘텐츠를 생성할 수 있지만, 주로 텍스트 프롬프트나 첫 번째/마지막 프레임과 같은 희소한 조건 신호에 의존하므로 서사 구조와 시간적 페이싱에 대한 정밀한 제어가 제한된다. 본 논문에서는 다중 키프레임을 통해 비디오 생성 모델의 서사 능력을 향상시키는 프레임워크인 SmartDirector를 제안한다. SmartDirector는 단일 샷 생성, 다중 샷 서사 합성, 비디오 확장을 포함한 유연한 생성 시나리오를 지원한다. 프레임워크는 두 단계로 작동한다: Director-Gen은 제공된 키프레임을 조건으로 저해상도 비디오를 생성하고, Director-SR은 고해상도 키프레임을 의미적 앵커로 활용하여 미세한 세부 사항을 복원함으로써 출력을 개선한다. 강건한 다중 키프레임 훈련을 가능하게 하기 위해, 영화에서 단일 샷 및 다중 샷 시퀀스를 선별하는 데이터 파이프라인을 구축한다. 광범위한 실험을 통해 SmartDirector가 기존 최첨단 접근법을 크게 능가함을 입증한다. 추후 연구를 위해 코드를 공개할 예정이다.

English

The narrative quality of a video fundamentally determines its perceptual value. Although existing video generation methods can produce visually appealing content, they predominantly rely on sparse conditioning signals such as text prompts or first/last frames, which limits precise control over narrative structure and temporal pacing. In this paper, we propose SmartDirector, a framework that enhances the narrative capacity of video generation models through multiple keyframes. SmartDirector supports flexible generation scenarios including single-shot generation, multi-shot narrative synthesis, and video extension. The framework operates in two stages: Director-Gen generates a low-resolution video conditioned on the provided keyframes, and Director-SR refines the output by exploiting high-resolution keyframes as semantic anchors to recover fine-grained details. To enable robust multi-keyframe training, we construct a data pipeline that curates single-shot and multi-shot sequences from movies. Extensive experiments demonstrate that SmartDirector substantially outperforms existing state-of-the-art approaches. We will release the code to facilitate further research.