无人知晓地理空间基础模型的最新技术水平

摘要

地理空间基础模型（GFMs）已被提议作为适用于灾害响应、土地覆盖制图、粮食安全监测及其他高风险地球观测任务的通用骨干模型。然而，关于这些模型的已发表研究成果并未为评审者或用户提供足够信息，以判断哪种模型适用于特定任务。我们认为，目前无人知晓地理空间基础模型领域的最新最优技术状态。这些方法或许有用，但GFM文献在标准化评估、训练与测试协议、权重发布机制及预训练控制方面缺乏足够的统一性，导致无法对模型进行比较或排序。在对152篇论文的审查中，我们发现同一模型、基准测试和协议存在46处跨论文分歧，差异至少达10个百分点；在可提取预训练数据的126篇论文中，有94篇使用了其他论文未采用的配置；39%的GFM论文未发布任何模型权重。这种社区标准的缺失是可以解决的。我们提出六项具体期望：采用命名许可协议的权重发布、共享核心评估体系、标注基线方法的复制与重新运行、报告方差、统一评估框架、以及数据-架构-算法控制。这些问题源于协调失败，而非任何单个实验室的过失；本文作者与GFM社区众多成员一样，也曾对此有所贡献。我们并非仅仅批判社区，而是旨在提供具体步骤，促进对如何创新GFM形成共识。

English

Geospatial foundation models (GFMs) have been proposed as generalizable backbones for disaster response, land-cover mapping, food-security monitoring, and other high-stakes Earth-observation tasks. Yet the published work about these models does not give reviewers or users enough information to tell which model fits a given task. We argue that nobody knows what the current state of the art is in geospatial foundation models. The methods may be useful, but the GFM literature does not standardize evaluations, training and testing protocols, released weights, or pretraining controls well enough for anyone to compare or rank them. In a 152-paper audit, we find 46 cross-paper disagreements of at least 10 points for the same model, benchmark, and protocol; 94/126 papers with extractable pretraining data use a configuration no other paper uses; and 39% of GFM papers release no model weights. This lack of community standards can be solved. We propose six concrete expectations: named-license weight release, shared core evaluations, copied-versus-rerun baseline annotations, variance reporting, one shared evaluation harness, and data-vs-architecture-vs-algorithm controls. These gaps are a coordination failure, not a fault of any individual lab; the authors of this paper, like many others in the GFM community, have contributed to them. Rather than just critiquing the community, we aim to provide concrete steps toward a shared understanding of how to innovate GFMs.