GeoDistill: Geometrie-gesteuerte Selbst-Distillation für schwach überwachte Cross-View-Lokalisierung

papers.abstract

Cross-View-Lokalisierung, die Aufgabe, die 3-Freiheitsgrade-Position (3-DoF) einer Kamera durch die Ausrichtung von Bodenbildern mit Satellitenbildern zu schätzen, ist entscheidend für groß angelegte Outdoor-Anwendungen wie autonome Navigation und Augmented Reality. Bestehende Methoden stützen sich oft auf vollständig überwachtes Lernen, das kostspielige Ground-Truth-Positionsannotationen erfordert. In dieser Arbeit schlagen wir GeoDistill vor, ein geometriegeführtes, schwach überwachtes Selbst-Distillations-Framework, das Lehrer-Schüler-Lernen mit Field-of-View (FoV)-basierter Maskierung verwendet, um das lokale Merkmal-Lernen für robuste Cross-View-Lokalisierung zu verbessern. In GeoDistill lokalisiert das Lehrer-Modell ein Panoramabild, während das Schüler-Modell Positionen aus einem begrenzten FoV-Gegenstück vorhersagt, das durch FoV-basierte Maskierung erstellt wird. Durch die Ausrichtung der Vorhersagen des Schülers mit denen des Lehrers konzentriert sich der Schüler auf Schlüsselmerkmale wie Fahrspurmarkierungen und ignoriert texturlose Regionen wie Straßen. Dies führt zu genaueren Vorhersagen und reduzierter Unsicherheit, unabhängig davon, ob die Abfragebilder Panoramen oder begrenzte FoV-Bilder sind. Unsere Experimente zeigen, dass GeoDistill die Lokalisierungsleistung über verschiedene Frameworks hinweg erheblich verbessert. Zusätzlich führen wir ein neuartiges Orientierungsschätzungsnetzwerk ein, das die relative Orientierung ohne präzise Ground-Truth-Planarposition vorhersagt. GeoDistill bietet eine skalierbare und effiziente Lösung für reale Cross-View-Lokalisierungsherausforderungen. Code und Modell sind unter https://github.com/tongshw/GeoDistill verfügbar.

English

Cross-view localization, the task of estimating a camera's 3-degrees-of-freedom (3-DoF) pose by aligning ground-level images with satellite images, is crucial for large-scale outdoor applications like autonomous navigation and augmented reality. Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations. In this work, we propose GeoDistill, a Geometry guided weakly supervised self distillation framework that uses teacher-student learning with Field-of-View (FoV)-based masking to enhance local feature learning for robust cross-view localization. In GeoDistill, the teacher model localizes a panoramic image, while the student model predicts locations from a limited FoV counterpart created by FoV-based masking. By aligning the student's predictions with those of the teacher, the student focuses on key features like lane lines and ignores textureless regions, such as roads. This results in more accurate predictions and reduced uncertainty, regardless of whether the query images are panoramas or limited FoV images. Our experiments show that GeoDistill significantly improves localization performance across different frameworks. Additionally, we introduce a novel orientation estimation network that predicts relative orientation without requiring precise planar position ground truth. GeoDistill provides a scalable and efficient solution for real-world cross-view localization challenges. Code and model can be found at https://github.com/tongshw/GeoDistill.

GeoDistill: Geometrie-gesteuerte Selbst-Distillation für schwach überwachte Cross-View-Lokalisierung

GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization

papers.abstract

Support