ChatPaper.aiChatPaper

认知无人机:面向无人机实时认知任务解决与推理的VLA模型及评估基准

CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

March 3, 2025
作者: Artem Lykov, Valerii Serpiva, Muhammad Haris Khan, Oleg Sautenkov, Artyom Myshlyaev, Grik Tadevosyan, Yasheerah Yaqoot, Dzmitry Tsetserukou
cs.AI

摘要

本文介绍了CognitiveDrone,一种专为需要高级认知能力的复杂无人机(UAV)任务设计的新型视觉-语言-动作(VLA)模型。该模型基于包含超过8,000条模拟飞行轨迹的数据集进行训练,涵盖三大关键类别——人类识别、符号理解与推理,能够根据第一人称视觉输入和文本指令生成实时的四维动作命令。为进一步提升在复杂场景中的表现,我们提出了CognitiveDrone-R1,它集成了一个额外的视觉-语言模型(VLM)推理模块,在高频控制前简化任务指令。通过使用我们开源的基准测试CognitiveDroneBench进行实验评估,结果显示,尽管以竞速为导向的模型(RaceVLA)总体成功率为31.3%,基础版CognitiveDrone模型达到59.6%,而CognitiveDrone-R1则实现了77.2%的成功率。这些结果表明,在关键认知任务上实现了高达30%的性能提升,凸显了将高级推理能力融入无人机控制系统的有效性。我们的贡献包括开发了用于无人机控制的最先进VLA模型,并引入了首个专门用于评估无人机操作中认知任务的基准测试。完整资源库可在cognitivedrone.github.io获取。
English
This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA) model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories-Human Recognition, Symbol Understanding, and Reasoning-the model generates real-time 4D action commands based on first-person visual inputs and textual instructions. To further enhance performance in intricate scenarios, we propose CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM) reasoning module to simplify task directives prior to high-frequency control. Experimental evaluations using our open-source benchmark, CognitiveDroneBench, reveal that while a racing-oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UAV control systems. Our contributions include the development of a state-of-the-art VLA model for UAV control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations. The complete repository is available at cognitivedrone.github.io
PDF32March 6, 2025