ChatPaper.aiChatPaper

地球塑造者:基於雙稀疏潛在擴散的可擴展三維地球生成

EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion

July 22, 2025
作者: Shang Liu, Chenjie Cao, Chaohui Yu, Wen Qian, Jing Wang, Fan Wang
cs.AI

摘要

尽管近期三维生成技术取得了显著进展,但将这些方法扩展至地理尺度——例如模拟地球表面数千平方公里的区域——仍是一个未解的难题。我们通过数据基础设施与模型架构的双重创新来应对这一挑战。首先,我们推出了迄今为止最大的三维航空数据集Aerial-Earth3D,该数据集包含在美国本土拍摄的5万幅精选场景(每幅场景覆盖600米×600米区域),共计4500万帧多视角谷歌地球图像。每个场景均提供带有姿态标注的多视角图像、深度图、法线图、语义分割及相机姿态,并通过严格的质量控制确保地形多样性。基于此,我们提出了EarthCrafter,一个专为大规模三维地球生成设计的框架,采用稀疏解耦潜在扩散技术。我们的架构将结构与纹理生成分离:1)双稀疏三维变分自编码器(3D-VAEs)将高分辨率几何体素与纹理二维高斯泼溅(2DGS)压缩至紧凑的潜在空间,大幅减轻了因地理尺度庞大而带来的高昂计算成本,同时保留了关键信息。2)我们提出了条件感知流匹配模型,该模型在混合输入(语义、图像或无输入)上训练,能够灵活地独立建模潜在几何与纹理特征。大量实验表明,EarthCrafter在超大规模生成任务中表现显著更优。该框架进一步支持多样化应用,从语义引导的城市布局生成到无条件地形合成,同时通过Aerial-Earth3D提供的丰富数据先验,保持了地理合理性。我们的项目页面位于https://whiteinblue.github.io/earthcrafter/。
English
Despite the remarkable developments achieved by recent 3D generation works, scaling these methods to geographic extents, such as modeling thousands of square kilometers of Earth's surface, remains an open challenge. We address this through a dual innovation in data infrastructure and model architecture. First, we introduce Aerial-Earth3D, the largest 3D aerial dataset to date, consisting of 50k curated scenes (each measuring 600m x 600m) captured across the U.S. mainland, comprising 45M multi-view Google Earth frames. Each scene provides pose-annotated multi-view images, depth maps, normals, semantic segmentation, and camera poses, with explicit quality control to ensure terrain diversity. Building on this foundation, we propose EarthCrafter, a tailored framework for large-scale 3D Earth generation via sparse-decoupled latent diffusion. Our architecture separates structural and textural generation: 1) Dual sparse 3D-VAEs compress high-resolution geometric voxels and textural 2D Gaussian Splats (2DGS) into compact latent spaces, largely alleviating the costly computation suffering from vast geographic scales while preserving critical information. 2) We propose condition-aware flow matching models trained on mixed inputs (semantics, images, or neither) to flexibly model latent geometry and texture features independently. Extensive experiments demonstrate that EarthCrafter performs substantially better in extremely large-scale generation. The framework further supports versatile applications, from semantic-guided urban layout generation to unconditional terrain synthesis, while maintaining geographic plausibility through our rich data priors from Aerial-Earth3D. Our project page is available at https://whiteinblue.github.io/earthcrafter/
PDF182July 25, 2025