GeoDistill: 弱教師付きクロスビュー位置推定のための幾何学誘導型自己蒸留

要旨

クロスビュー位置推定は、地上レベルの画像と衛星画像を位置合わせすることで、カメラの3自由度（3-DoF）姿勢を推定するタスクであり、自律走行や拡張現実などの大規模な屋外アプリケーションにおいて重要です。既存の手法は、高コストな正解姿勢アノテーションを必要とする完全教師あり学習に依存することが多いです。本研究では、GeoDistillという、ジオメトリをガイドとした弱教師あり自己蒸留フレームワークを提案します。このフレームワークでは、教師-生徒学習と視野角（FoV）に基づくマスキングを用いて、ロバストなクロスビュー位置推定のための局所的特徴学習を強化します。GeoDistillでは、教師モデルがパノラマ画像を位置推定し、生徒モデルはFoVベースのマスキングによって作成された限定視野角の画像から位置を予測します。生徒の予測を教師の予測と位置合わせすることで、生徒は車線などの重要な特徴に注目し、道路などのテクスチャのない領域を無視するようになります。これにより、クエリ画像がパノラマか限定視野角の画像かに関わらず、より正確な予測と不確実性の低減が実現されます。実験の結果、GeoDistillはさまざまなフレームワークにおいて位置推定性能を大幅に向上させることが示されました。さらに、正確な平面位置の正解データを必要とせずに相対的な方位を推定する新しい方位推定ネットワークを導入します。GeoDistillは、実世界のクロスビュー位置推定の課題に対するスケーラブルで効率的なソリューションを提供します。コードとモデルはhttps://github.com/tongshw/GeoDistillで公開されています。

English

Cross-view localization, the task of estimating a camera's 3-degrees-of-freedom (3-DoF) pose by aligning ground-level images with satellite images, is crucial for large-scale outdoor applications like autonomous navigation and augmented reality. Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations. In this work, we propose GeoDistill, a Geometry guided weakly supervised self distillation framework that uses teacher-student learning with Field-of-View (FoV)-based masking to enhance local feature learning for robust cross-view localization. In GeoDistill, the teacher model localizes a panoramic image, while the student model predicts locations from a limited FoV counterpart created by FoV-based masking. By aligning the student's predictions with those of the teacher, the student focuses on key features like lane lines and ignores textureless regions, such as roads. This results in more accurate predictions and reduced uncertainty, regardless of whether the query images are panoramas or limited FoV images. Our experiments show that GeoDistill significantly improves localization performance across different frameworks. Additionally, we introduce a novel orientation estimation network that predicts relative orientation without requiring precise planar position ground truth. GeoDistill provides a scalable and efficient solution for real-world cross-view localization challenges. Code and model can be found at https://github.com/tongshw/GeoDistill.

GeoDistill: 弱教師付きクロスビュー位置推定のための幾何学誘導型自己蒸留

GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization

要旨

Support