ChatPaper.aiChatPaper

GeoDistill:基於幾何引導的自蒸餾方法用於弱監督跨視角定位

GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization

July 15, 2025
作者: Shaowen Tong, Zimin Xia, Alexandre Alahi, Xuming He, Yujiao Shi
cs.AI

摘要

跨視角定位,即通過對齊地面圖像與衛星圖像來估計相機的三自由度(3-DoF)姿態,對於大規模戶外應用如自主導航和增強現實至關重要。現有方法通常依賴於全監督學習,這需要昂貴的真實姿態標註。在本研究中,我們提出了GeoDistill,這是一個幾何引導的弱監督自蒸餾框架,它利用基於視場角(FoV)的掩碼進行師生學習,以增強局部特徵學習,從而實現魯棒的跨視角定位。在GeoDistill中,教師模型定位全景圖像,而學生模型則從通過FoV掩碼生成的有限視場圖像中預測位置。通過將學生的預測與教師的預測對齊,學生專注於關鍵特徵如車道線,並忽略無紋理區域如道路。這導致了無論查詢圖像是全景還是有限視場圖像,都能獲得更加精確的預測和降低的不確定性。我們的實驗表明,GeoDistill在不同框架下顯著提升了定位性能。此外,我們引入了一種新穎的方向估計網絡,它無需精確的平面位置真值即可預測相對方向。GeoDistill為現實世界中的跨視角定位挑戰提供了一個可擴展且高效的解決方案。代碼和模型可在https://github.com/tongshw/GeoDistill找到。
English
Cross-view localization, the task of estimating a camera's 3-degrees-of-freedom (3-DoF) pose by aligning ground-level images with satellite images, is crucial for large-scale outdoor applications like autonomous navigation and augmented reality. Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations. In this work, we propose GeoDistill, a Geometry guided weakly supervised self distillation framework that uses teacher-student learning with Field-of-View (FoV)-based masking to enhance local feature learning for robust cross-view localization. In GeoDistill, the teacher model localizes a panoramic image, while the student model predicts locations from a limited FoV counterpart created by FoV-based masking. By aligning the student's predictions with those of the teacher, the student focuses on key features like lane lines and ignores textureless regions, such as roads. This results in more accurate predictions and reduced uncertainty, regardless of whether the query images are panoramas or limited FoV images. Our experiments show that GeoDistill significantly improves localization performance across different frameworks. Additionally, we introduce a novel orientation estimation network that predicts relative orientation without requiring precise planar position ground truth. GeoDistill provides a scalable and efficient solution for real-world cross-view localization challenges. Code and model can be found at https://github.com/tongshw/GeoDistill.
PDF11July 22, 2025