InfinityStory: 세계 일관성과 캐릭터 인식 샷 전환을 통한 무제한 비디오 생성

초록

일관된 시각적 내러티브를 갖춘 장편 스토리텔링 동영상 생성은 비디오 합성 분야에서 여전히 중요한 과제로 남아 있습니다. 본 연구는 세 가지 주요 한계점—촬영 간 배경 일관성, 다중 주체 간 원활한 숏-투-숏 전환, 시간 단위 장편 내러티브 확장성—를 해결하는 새로운 프레임워크, 데이터셋 및 모델을 제안합니다. 우리의 접근 방식은 캐릭터 정체성과 공간 관계를 보존하면서 장면 전반에 걸쳐 시각적 일관성을 유지하는 배경 일관성 생성 파이프라인을 도입합니다. 더 나아가 단일 주체에 국한된 기존 연구의 한계를 넘어, 여러 주체가 프레임에 진입하거나 퇴장하는 복잡한 시나리오에서 부드러운 숏 전환을 생성하는 전환 인식 비디오 합성 모듈을 제안합니다. 이를 지원하기 위해 저희는 기존에 충분히 다루어지지 않은 동적 장면 구성을 포함하는 10,000개의 다중 주체 전환 시퀀스로 구성된 합성 데이터셋을 공개합니다. VBench에서 InfinityStory는 가장 높은 배경 일관성(88.94), 가장 높은 주체 일관성(82.11), 그리고 최고의 전체 평균 순위(2.80)를 기록하여 향상된 안정성, 더 부드러운 전환, 더 나은 시간적 일관성을 입증했습니다.

English

Generating long-form storytelling videos with consistent visual narratives remains a significant challenge in video synthesis. We present a novel framework, dataset, and a model that address three critical limitations: background consistency across shots, seamless multi-subject shot-to-shot transitions, and scalability to hour-long narratives. Our approach introduces a background-consistent generation pipeline that maintains visual coherence across scenes while preserving character identity and spatial relationships. We further propose a transition-aware video synthesis module that generates smooth shot transitions for complex scenarios involving multiple subjects entering or exiting frames, going beyond the single-subject limitations of prior work. To support this, we contribute with a synthetic dataset of 10,000 multi-subject transition sequences covering underrepresented dynamic scene compositions. On VBench, InfinityStory achieves the highest Background Consistency (88.94), highest Subject Consistency (82.11), and the best overall average rank (2.80), showing improved stability, smoother transitions, and better temporal coherence.

InfinityStory: 세계 일관성과 캐릭터 인식 샷 전환을 통한 무제한 비디오 생성

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

초록

Support