ChatPaper.aiChatPaper

見、指、飛:一種無需學習的視覺語言模型框架,實現通用無人機導航

See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation

September 26, 2025
作者: Chih Yao Hu, Yang-Sen Lin, Yuna Lee, Chih-Hai Su, Jie-Ying Lee, Shr-Ruei Tsai, Chin-Yang Lin, Kuan-Wen Chen, Tsung-Wei Ke, Yu-Lun Liu
cs.AI

摘要

我们提出了“看、指、飞”(See, Point, Fly, SPF),一种基于视觉-语言模型(Vision-Language Models, VLMs)的无训练空中视觉与语言导航(Aerial Vision-and-Language Navigation, AVLN)框架。SPF能够在任何环境中,依据任何形式的自由指令,导航至任意目标。与现有将动作预测视为文本生成任务的VLM方法不同,我们的核心见解是将AVLN中的动作预测视为二维空间定位任务。SPF利用VLMs将模糊的语言指令分解为输入图像上二维航点的迭代标注。结合预测的飞行距离,SPF将预测的二维航点转换为三维位移向量,作为无人机的动作指令。此外,SPF还自适应地调整飞行距离,以促进更高效的导航。值得注意的是,SPF以闭环控制的方式执行导航,使无人机能够在动态环境中跟随动态目标。在DRL模拟基准测试中,SPF创下了新的技术标杆,较之前最佳方法的绝对优势达63%。在广泛的现实世界评估中,SPF大幅超越了强基线。我们还进行了全面的消融研究,以凸显我们设计选择的有效性。最后,SPF展示了对于不同VLMs的显著泛化能力。项目页面:https://spf-web.pages.dev
English
We present See, Point, Fly (SPF), a training-free aerial vision-and-language navigation (AVLN) framework built atop vision-language models (VLMs). SPF is capable of navigating to any goal based on any type of free-form instructions in any kind of environment. In contrast to existing VLM-based approaches that treat action prediction as a text generation task, our key insight is to consider action prediction for AVLN as a 2D spatial grounding task. SPF harnesses VLMs to decompose vague language instructions into iterative annotation of 2D waypoints on the input image. Along with the predicted traveling distance, SPF transforms predicted 2D waypoints into 3D displacement vectors as action commands for UAVs. Moreover, SPF also adaptively adjusts the traveling distance to facilitate more efficient navigation. Notably, SPF performs navigation in a closed-loop control manner, enabling UAVs to follow dynamic targets in dynamic environments. SPF sets a new state of the art in DRL simulation benchmark, outperforming the previous best method by an absolute margin of 63%. In extensive real-world evaluations, SPF outperforms strong baselines by a large margin. We also conduct comprehensive ablation studies to highlight the effectiveness of our design choice. Lastly, SPF shows remarkable generalization to different VLMs. Project page: https://spf-web.pages.dev
PDF202September 29, 2025