ChatPaper.aiChatPaper

GeoDistill:几何引导的自蒸馏弱监督跨视角定位

GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization

July 15, 2025
作者: Shaowen Tong, Zimin Xia, Alexandre Alahi, Xuming He, Yujiao Shi
cs.AI

摘要

跨视角定位,即通过将地面图像与卫星图像对齐来估计相机的三自由度(3-DoF)姿态,对于自动驾驶导航和增强现实等大规模户外应用至关重要。现有方法通常依赖于全监督学习,这需要成本高昂的真实姿态标注。在本研究中,我们提出了GeoDistill,一种几何引导的弱监督自蒸馏框架,它利用教师-学生学习结合基于视场(FoV)的掩码技术,以增强局部特征学习,实现鲁棒的跨视角定位。在GeoDistill中,教师模型对全景图像进行定位,而学生模型则从通过FoV掩码生成的有限视场图像中预测位置。通过将学生的预测与教师的预测对齐,学生能够专注于车道线等关键特征,并忽略如道路等无纹理区域。这带来了更准确的预测和降低的不确定性,无论查询图像是全景还是有限视场图像。我们的实验表明,GeoDistill显著提升了不同框架下的定位性能。此外,我们引入了一种新颖的方向估计网络,它无需精确的平面位置真值即可预测相对方向。GeoDistill为现实世界中的跨视角定位挑战提供了一个可扩展且高效的解决方案。代码和模型可在https://github.com/tongshw/GeoDistill获取。
English
Cross-view localization, the task of estimating a camera's 3-degrees-of-freedom (3-DoF) pose by aligning ground-level images with satellite images, is crucial for large-scale outdoor applications like autonomous navigation and augmented reality. Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations. In this work, we propose GeoDistill, a Geometry guided weakly supervised self distillation framework that uses teacher-student learning with Field-of-View (FoV)-based masking to enhance local feature learning for robust cross-view localization. In GeoDistill, the teacher model localizes a panoramic image, while the student model predicts locations from a limited FoV counterpart created by FoV-based masking. By aligning the student's predictions with those of the teacher, the student focuses on key features like lane lines and ignores textureless regions, such as roads. This results in more accurate predictions and reduced uncertainty, regardless of whether the query images are panoramas or limited FoV images. Our experiments show that GeoDistill significantly improves localization performance across different frameworks. Additionally, we introduce a novel orientation estimation network that predicts relative orientation without requiring precise planar position ground truth. GeoDistill provides a scalable and efficient solution for real-world cross-view localization challenges. Code and model can be found at https://github.com/tongshw/GeoDistill.
PDF11July 22, 2025