EarthCrafter：基于双稀疏潜在扩散的可扩展三维地球生成

摘要

尽管近期3D生成技术取得了显著进展，但将这些方法扩展到地理尺度——例如建模数千平方公里的地球表面——仍是一个未解的难题。我们通过数据基础设施与模型架构的双重创新来应对这一挑战。首先，我们推出了Aerial-Earth3D，这是迄今为止最大的3D航空数据集，包含在美国本土拍摄的50,000个精选场景（每个场景大小为600米×600米），共计4500万帧多视角Google Earth图像。每个场景提供带有姿态标注的多视角图像、深度图、法线、语义分割及相机位姿，并通过严格的质量控制确保地形多样性。基于此，我们提出了EarthCrafter，一个专为大规模3D地球生成设计的框架，采用稀疏解耦的潜在扩散方法。我们的架构将结构与纹理生成分离：1）双稀疏3D-VAE将高分辨率几何体素和纹理2D高斯溅射（2DGS）压缩至紧凑的潜在空间，极大缓解了因地理规模庞大带来的计算成本，同时保留了关键信息。2）我们提出了条件感知的流匹配模型，训练于混合输入（语义、图像或无输入）之上，以灵活独立地建模潜在几何与纹理特征。大量实验表明，EarthCrafter在超大规模生成任务中表现卓越。该框架还支持多种应用，从语义引导的城市布局生成到无条件地形合成，同时通过Aerial-Earth3D提供的丰富数据先验，保持了地理合理性。项目页面请访问https://whiteinblue.github.io/earthcrafter/。

English

Despite the remarkable developments achieved by recent 3D generation works, scaling these methods to geographic extents, such as modeling thousands of square kilometers of Earth's surface, remains an open challenge. We address this through a dual innovation in data infrastructure and model architecture. First, we introduce Aerial-Earth3D, the largest 3D aerial dataset to date, consisting of 50k curated scenes (each measuring 600m x 600m) captured across the U.S. mainland, comprising 45M multi-view Google Earth frames. Each scene provides pose-annotated multi-view images, depth maps, normals, semantic segmentation, and camera poses, with explicit quality control to ensure terrain diversity. Building on this foundation, we propose EarthCrafter, a tailored framework for large-scale 3D Earth generation via sparse-decoupled latent diffusion. Our architecture separates structural and textural generation: 1) Dual sparse 3D-VAEs compress high-resolution geometric voxels and textural 2D Gaussian Splats (2DGS) into compact latent spaces, largely alleviating the costly computation suffering from vast geographic scales while preserving critical information. 2) We propose condition-aware flow matching models trained on mixed inputs (semantics, images, or neither) to flexibly model latent geometry and texture features independently. Extensive experiments demonstrate that EarthCrafter performs substantially better in extremely large-scale generation. The framework further supports versatile applications, from semantic-guided urban layout generation to unconditional terrain synthesis, while maintaining geographic plausibility through our rich data priors from Aerial-Earth3D. Our project page is available at https://whiteinblue.github.io/earthcrafter/

EarthCrafter：基于双稀疏潜在扩散的可扩展三维地球生成

EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion

摘要

Support