無人知曉地理空間基礎模型的最新進展

摘要

地理空間基礎模型（GFMs）已被提出作為適用於災害應對、土地覆蓋製圖、糧食安全監測及其他高風險地球觀測任務的通用型骨幹架構。然而，關於這些模型的已發表文獻並未提供足夠資訊，讓審稿人或使用者判斷何種模型最適合特定任務。我們認為，目前無人確知地理空間基礎模型的技術現狀為何。這些方法或許有其價值，但GFM文獻在評估標準、訓練與測試流程、釋出權重及預訓練控制條件等方面的規範化程度不足，以致無法進行比較或排序。在針對152篇論文的審查中，我們發現同一模型、基準與流程組合下，共有46次跨論文實質分歧（差異至少10個百分點）；在可提取預訓練資料的126篇論文中，有94篇採用獨一無二的配置（其他論文未使用）；此外，39%的GFM論文未釋出模型權重。此類社群標準缺失問題實可解決。我們提出六項具體期望：具名授權之權重釋出、共享核心評估項目、抄襲與重跑基準註記、變異數報告、單一共享評估框架，以及數據、架構與演算法之控制變因設計。這些落差屬於協調失敗，而非任何單一實驗室之過；本論文作者如同GFM社群其他成員，亦曾對此現象有所貢獻。我們的目的不僅是批判社群，更在於提供具體步驟，以期共同理解如何推動GFM創新。

English

Geospatial foundation models (GFMs) have been proposed as generalizable backbones for disaster response, land-cover mapping, food-security monitoring, and other high-stakes Earth-observation tasks. Yet the published work about these models does not give reviewers or users enough information to tell which model fits a given task. We argue that nobody knows what the current state of the art is in geospatial foundation models. The methods may be useful, but the GFM literature does not standardize evaluations, training and testing protocols, released weights, or pretraining controls well enough for anyone to compare or rank them. In a 152-paper audit, we find 46 cross-paper disagreements of at least 10 points for the same model, benchmark, and protocol; 94/126 papers with extractable pretraining data use a configuration no other paper uses; and 39% of GFM papers release no model weights. This lack of community standards can be solved. We propose six concrete expectations: named-license weight release, shared core evaluations, copied-versus-rerun baseline annotations, variance reporting, one shared evaluation harness, and data-vs-architecture-vs-algorithm controls. These gaps are a coordination failure, not a fault of any individual lab; the authors of this paper, like many others in the GFM community, have contributed to them. Rather than just critiquing the community, we aim to provide concrete steps toward a shared understanding of how to innovate GFMs.