아무도 지리공간 파운데이션 모델의 최첨단을 알지 못한다.

초록

지리공간 기초 모델(GFM)은 재난 대응, 토지 피복 매핑, 식량 안보 모니터링 및 기타 고위험 지구 관측 작업을 위한 일반화 가능한 백본으로 제안되어 왔다. 그러나 이러한 모델에 관한 발표된 연구는 검토자나 사용자에게 특정 작업에 어떤 모델이 적합한지 판단할 충분한 정보를 제공하지 않는다. 우리는 지리공간 기초 모델 분야에서 현재 최신 기술 수준이 무엇인지 아무도 모른다고 주장한다. 해당 방법들이 유용할 수는 있지만, GFM 문헌은 평가, 훈련 및 테스트 프로토콜, 공개된 가중치, 사전 학습 통제를 비교하거나 순위를 매길 수 있을 만큼 충분히 표준화하지 않는다. 152편의 논문 감사(audit)에서, 동일한 모델, 벤치마크, 프로토콜에 대해 최소 10포인트 차이를 보이는 46건의 논문 간 불일치를 발견했다. 추출 가능한 사전 학습 데이터가 있는 126편의 논문 중 94편(74.6%)이 다른 어떤 논문도 사용하지 않는 구성을 사용했으며, GFM 논문의 39%는 모델 가중치를 전혀 공개하지 않았다. 이러한 커뮤니티 표준의 부재는 해결 가능하다. 우리는 명명된 라이선스 하의 가중치 공개, 공유 핵심 평가, 복사 대 재실행 기준선 주석, 분산 보고, 하나의 공유 평가 도구, 데이터 대 아키텍처 대 알고리즘 통제 등 여섯 가지 구체적인 기대 사항을 제안한다. 이러한 격차는 개별 연구실의 잘못이 아니라 조정 실패에서 비롯된 것이다. 본 논문의 저자들 역시 GFM 커뮤니티의 많은 다른 연구자들과 마찬가지로 이러한 문제에 기여해 왔다. 우리는 단순히 커뮤니티를 비판하는 것을 넘어, GFM을 혁신하는 방법에 대한 공유된 이해를 향한 구체적인 단계를 제시하고자 한다.

English

Geospatial foundation models (GFMs) have been proposed as generalizable backbones for disaster response, land-cover mapping, food-security monitoring, and other high-stakes Earth-observation tasks. Yet the published work about these models does not give reviewers or users enough information to tell which model fits a given task. We argue that nobody knows what the current state of the art is in geospatial foundation models. The methods may be useful, but the GFM literature does not standardize evaluations, training and testing protocols, released weights, or pretraining controls well enough for anyone to compare or rank them. In a 152-paper audit, we find 46 cross-paper disagreements of at least 10 points for the same model, benchmark, and protocol; 94/126 papers with extractable pretraining data use a configuration no other paper uses; and 39% of GFM papers release no model weights. This lack of community standards can be solved. We propose six concrete expectations: named-license weight release, shared core evaluations, copied-versus-rerun baseline annotations, variance reporting, one shared evaluation harness, and data-vs-architecture-vs-algorithm controls. These gaps are a coordination failure, not a fault of any individual lab; the authors of this paper, like many others in the GFM community, have contributed to them. Rather than just critiquing the community, we aim to provide concrete steps toward a shared understanding of how to innovate GFMs.