Matrix-3D: 전방위적 탐색 가능한 3D 세계 생성

초록

단일 이미지 또는 텍스트 프롬프트로부터 탐색 가능한 3D 세계를 생성하는 것은 공간 지능의 핵심 요소입니다. 최근 연구들은 비디오 모델을 활용하여 광범위하고 일반화 가능한 3D 세계 생성을 달성하고 있습니다. 그러나 기존 접근 방식들은 생성된 장면의 범위가 제한되는 문제를 자주 겪습니다. 본 연구에서는 조건부 비디오 생성과 파노라마 3D 재구성을 결합한 광범위한 커버리지를 가진 전방위 탐색 가능 3D 세계 생성을 위해 파노라마 표현을 활용하는 Matrix-3D 프레임워크를 제안합니다. 먼저, 장면 메시 렌더를 조건으로 사용하여 고품질이고 기하학적으로 일관된 장면 비디오 생성을 가능하게 하는 궤적 기반 파노라마 비디오 확산 모델을 학습합니다. 파노라마 장면 비디오를 3D 세계로 변환하기 위해 두 가지 별도의 방법을 제안합니다: (1) 빠른 3D 장면 재구성을 위한 순방향 대형 파노라마 재구성 모델과 (2) 정확하고 세부적인 3D 장면 재구성을 위한 최적화 기반 파이프라인입니다. 효과적인 학습을 위해, 깊이와 궤적 주석이 포함된 116K개의 고품질 정적 파노라마 비디오 시퀀스로 구성된 대규모 합성 데이터셋인 Matrix-Pano 데이터셋도 소개합니다. 광범위한 실험을 통해 제안된 프레임워크가 파노라마 비디오 생성 및 3D 세계 생성에서 최첨단 성능을 달성함을 입증합니다. 자세한 내용은 https://matrix-3d.github.io에서 확인할 수 있습니다.

English

Explorable 3D world generation from a single image or text prompt forms a cornerstone of spatial intelligence. Recent works utilize video model to achieve wide-scope and generalizable 3D world generation. However, existing approaches often suffer from a limited scope in the generated scenes. In this work, we propose Matrix-3D, a framework that utilize panoramic representation for wide-coverage omnidirectional explorable 3D world generation that combines conditional video generation and panoramic 3D reconstruction. We first train a trajectory-guided panoramic video diffusion model that employs scene mesh renders as condition, to enable high-quality and geometrically consistent scene video generation. To lift the panorama scene video to 3D world, we propose two separate methods: (1) a feed-forward large panorama reconstruction model for rapid 3D scene reconstruction and (2) an optimization-based pipeline for accurate and detailed 3D scene reconstruction. To facilitate effective training, we also introduce the Matrix-Pano dataset, the first large-scale synthetic collection comprising 116K high-quality static panoramic video sequences with depth and trajectory annotations. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance in panoramic video generation and 3D world generation. See more in https://matrix-3d.github.io.

Matrix-3D: 전방위적 탐색 가능한 3D 세계 생성

Matrix-3D: Omnidirectional Explorable 3D World Generation

초록

Support