RemoteZero: 人間による注釈を一切用いない地理空間推論

要旨

地理空間推論は、地球観測において複雑な空間意味論とユーザ意図を精密な対象位置へと解決することをモデルに要求する。近年の進展により、推論経路は人手による選定から解放され、モデル自身が推論連鎖を生成できるようになった。しかし最終的な依存関係は残されている：それらは依然として人手で注釈付けされた正解座標による教師監督を受けている。これにより推論プロセスは自律的であるが、その空間的終点は自律しておらず、豊富なラベルなしリモートセンシングデータにおける真の自己進化を妨げている。このボトルネックを打破するため、我々はバウンディングボックス監督を不要とする地理空間推論フレームワーク、RemoteZeroを提案する。RemoteZeroは単純な非対称性に動機づけられている：MLLM（大規模言語モデル）は、一般的に、精密な座標を直接生成するよりも、ある領域が問い合わせを満たすかどうかを検証する方が得意である。このより強力な識別能力を活用し、RemoteZeroは幾何学的な監督を内在的な意味論的検証に置き換え、バウンディングボックス注釈なしでのGRPO（Geospatial Reasoning from Partial Observations）学習を可能にする。結果として得られるフレームワークは反復的な自己進化をさらに支援し、モデルがラベルなしリモートセンシング画像から自身の検証信号を通じて改善することを可能にする。実験により、RemoteZeroが強力な教師あり手法と同等の性能を達成することが示され、地理空間推論位置特定のための自己検証型学習の可能性が実証された。

English

Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inference chains. Yet a final dependency remains: they are still supervised by human-annotated ground-truth coordinates. This leaves the reasoning process autonomous, but not its spatial endpoint, and prevents true self-evolution on abundant unlabeled remote sensing data. To break this bottleneck, we introduce RemoteZero, a box-supervision-free framework for geospatial reasoning. RemoteZero is motivated by a simple asymmetry: an MLLM is typically better at verifying whether a region satisfies a query than at directly generating precise coordinates. Leveraging this stronger discriminative ability, RemoteZero replaces geometric supervision with intrinsic semantic verification and enables GRPO training without box annotations. The resulting framework further supports iterative self-evolution, allowing the model to improve from unlabeled remote sensing imagery through its own verification signal. Experiments show that RemoteZero achieves competitive performance against strong supervised methods, demonstrating the potential of self-verifying training for geospatial reasoning localization.

RemoteZero: 人間による注釈を一切用いない地理空間推論

RemoteZero: Geospatial Reasoning with Zero Human Annotations

要旨

Support