인터랙티브 생성 비디오 기술에 대한 연구 동향

초록

인터랙티브 생성 비디오(Interactive Generative Video, IGV)는 다양한 분야에서 고품질의 인터랙티브 비디오 콘텐츠에 대한 수요가 증가함에 따라 중요한 기술로 부상하고 있다. 본 논문에서 우리는 IGV를 사용자 참여를 가능하게 하는 제어 신호와 반응형 피드백을 통해 다양한 고품질 비디오 콘텐츠를 생성하는 생성 능력을 결합한 기술로 정의한다. 우리는 IGV의 현재 응용 현황을 조사하며, 세 가지 주요 분야에 초점을 맞춘다: 1) 게임 분야, 여기서 IGV는 가상 세계에서의 무한한 탐색을 가능하게 한다; 2) 구현된 AI, 여기서 IGV는 다이나믹하게 진화하는 장면과의 다중 모드 상호작용에서 에이전트를 훈련시키기 위한 물리학적 인식 환경 합성기로 작용한다; 3) 자율 주행, 여기서 IGV는 안전-중요 테스트와 검증을 위한 폐쇄 루프 시뮬레이션 기능을 제공한다. 미래 개발을 안내하기 위해, 우리는 이상적인 IGV 시스템을 다섯 가지 필수 모듈로 분해하는 포괄적인 프레임워크를 제안한다: 생성, 제어, 메모리, 다이나믹스, 그리고 지능. 더 나아가, 우리는 이상적인 IGV 시스템을 실현하기 위한 각 구성 요소의 기술적 도전과 미래 방향을 체계적으로 분석한다. 예를 들어, 실시간 생성 달성, 오픈 도메인 제어 가능, 장기적 일관성 유지, 정확한 물리 시뮬레이션, 그리고 인과적 추론 통합 등이 포함된다. 우리는 이 체계적인 분석이 IGV 분야의 미래 연구와 개발을 촉진하고, 궁극적으로 더 정교하고 실용적인 응용으로 기술을 발전시킬 것이라고 믿는다.

English

Interactive Generative Video (IGV) has emerged as a crucial technology in response to the growing demand for high-quality, interactive video content across various domains. In this paper, we define IGV as a technology that combines generative capabilities to produce diverse high-quality video content with interactive features that enable user engagement through control signals and responsive feedback. We survey the current landscape of IGV applications, focusing on three major domains: 1) gaming, where IGV enables infinite exploration in virtual worlds; 2) embodied AI, where IGV serves as a physics-aware environment synthesizer for training agents in multimodal interaction with dynamically evolving scenes; and 3) autonomous driving, where IGV provides closed-loop simulation capabilities for safety-critical testing and validation. To guide future development, we propose a comprehensive framework that decomposes an ideal IGV system into five essential modules: Generation, Control, Memory, Dynamics, and Intelligence. Furthermore, we systematically analyze the technical challenges and future directions in realizing each component for an ideal IGV system, such as achieving real-time generation, enabling open-domain control, maintaining long-term coherence, simulating accurate physics, and integrating causal reasoning. We believe that this systematic analysis will facilitate future research and development in the field of IGV, ultimately advancing the technology toward more sophisticated and practical applications.

인터랙티브 생성 비디오 기술에 대한 연구 동향

A Survey of Interactive Generative Video

초록

Support