EarthCrafter: 이중 희소 잠재 확산을 통한 확장 가능한 3D 지구 생성

초록

최근 3D 생성 기술이 놀라운 발전을 이루었음에도 불구하고, 이러한 방법을 지구 표면의 수천 평방 킬로미터와 같은 지리적 규모로 확장하는 것은 여전히 해결되지 않은 과제로 남아 있습니다. 우리는 데이터 인프라와 모델 아키텍처에서의 이중 혁신을 통해 이 문제를 해결합니다. 먼저, 우리는 현재까지 가장 큰 3D 항공 데이터셋인 Aerial-Earth3D를 소개합니다. 이 데이터셋은 미국 본토 전역에서 촬영된 50,000개의 정제된 장면(각각 600m x 600m 크기)으로 구성되어 있으며, 4,500만 개의 다중 뷰 Google Earth 프레임을 포함합니다. 각 장면은 포즈 주석이 달린 다중 뷰 이미지, 깊이 맵, 노멀 맵, 의미론적 분할, 카메라 포즈를 제공하며, 지형 다양성을 보장하기 위한 명시적인 품질 관리가 적용되었습니다. 이를 기반으로, 우리는 희소-분리 잠재 확산을 통해 대규모 3D 지구 생성을 위한 맞춤형 프레임워크인 EarthCrafter를 제안합니다. 우리의 아키텍처는 구조적 생성과 질감 생성을 분리합니다: 1) 이중 희소 3D-VAE는 고해상도 기하학적 복셀과 질감 2D 가우시안 스플랫(2DGS)을 컴팩트한 잠재 공간으로 압축하여, 광대한 지리적 규모로 인한 고비용 계산 문제를 크게 완화하면서도 중요한 정보를 보존합니다. 2) 우리는 혼합 입력(의미론, 이미지, 또는 둘 다 없음)으로 훈련된 조건 인식 흐름 매칭 모델을 제안하여, 잠재 기하학 및 질감 특징을 독립적으로 유연하게 모델링합니다. 광범위한 실험을 통해 EarthCrafter가 극도로 대규모 생성에서 상당히 우수한 성능을 보임을 입증했습니다. 이 프레임워크는 의미론적으로 유도된 도시 레이아웃 생성부터 무조건적 지형 합성에 이르기까지 다양한 응용을 지원하며, Aerial-Earth3D의 풍부한 데이터 사전 정보를 통해 지리적 타당성을 유지합니다. 우리의 프로젝트 페이지는 https://whiteinblue.github.io/earthcrafter/에서 확인할 수 있습니다.

English

Despite the remarkable developments achieved by recent 3D generation works, scaling these methods to geographic extents, such as modeling thousands of square kilometers of Earth's surface, remains an open challenge. We address this through a dual innovation in data infrastructure and model architecture. First, we introduce Aerial-Earth3D, the largest 3D aerial dataset to date, consisting of 50k curated scenes (each measuring 600m x 600m) captured across the U.S. mainland, comprising 45M multi-view Google Earth frames. Each scene provides pose-annotated multi-view images, depth maps, normals, semantic segmentation, and camera poses, with explicit quality control to ensure terrain diversity. Building on this foundation, we propose EarthCrafter, a tailored framework for large-scale 3D Earth generation via sparse-decoupled latent diffusion. Our architecture separates structural and textural generation: 1) Dual sparse 3D-VAEs compress high-resolution geometric voxels and textural 2D Gaussian Splats (2DGS) into compact latent spaces, largely alleviating the costly computation suffering from vast geographic scales while preserving critical information. 2) We propose condition-aware flow matching models trained on mixed inputs (semantics, images, or neither) to flexibly model latent geometry and texture features independently. Extensive experiments demonstrate that EarthCrafter performs substantially better in extremely large-scale generation. The framework further supports versatile applications, from semantic-guided urban layout generation to unconditional terrain synthesis, while maintaining geographic plausibility through our rich data priors from Aerial-Earth3D. Our project page is available at https://whiteinblue.github.io/earthcrafter/

EarthCrafter: 이중 희소 잠재 확산을 통한 확장 가능한 3D 지구 생성

EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion

초록

Support