CognitiveDrone:一個用於無人機即時認知任務解決與推理的VLA模型及評估基準
CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs
March 3, 2025
作者: Artem Lykov, Valerii Serpiva, Muhammad Haris Khan, Oleg Sautenkov, Artyom Myshlyaev, Grik Tadevosyan, Yasheerah Yaqoot, Dzmitry Tsetserukou
cs.AI
摘要
本文介紹了CognitiveDrone,這是一種專為需要高級認知能力的複雜無人機(UAV)任務量身定制的新型視覺-語言-動作(VLA)模型。該模型基於包含超過8,000條模擬飛行軌跡的數據集進行訓練,涵蓋三個關鍵類別——人類識別、符號理解與推理——並根據第一人稱視覺輸入和文本指令生成實時4D動作命令。為進一步提升在複雜場景中的表現,我們提出了CognitiveDrone-R1,該版本整合了額外的視覺-語言模型(VLM)推理模塊,以在高頻控制前簡化任務指令。使用我們開源的基準測試CognitiveDroneBench進行的實驗評估顯示,儘管以競速為導向的模型(RaceVLA)的總體成功率為31.3%,但基礎版CognitiveDrone模型達到了59.6%,而CognitiveDrone-R1則實現了77.2%的成功率。這些結果表明,在關鍵認知任務中,性能提升高達30%,突顯了將高級推理能力融入無人機控制系統的有效性。我們的貢獻包括開發了一種用於無人機控制的先進VLA模型,並引入了首個專門用於評估無人機操作中認知任務的基準測試。完整資源庫可在cognitivedrone.github.io獲取。
English
This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA)
model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand
advanced cognitive abilities. Trained on a dataset comprising over 8,000
simulated flight trajectories across three key categories-Human Recognition,
Symbol Understanding, and Reasoning-the model generates real-time 4D action
commands based on first-person visual inputs and textual instructions. To
further enhance performance in intricate scenarios, we propose
CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM)
reasoning module to simplify task directives prior to high-frequency control.
Experimental evaluations using our open-source benchmark, CognitiveDroneBench,
reveal that while a racing-oriented model (RaceVLA) achieves an overall success
rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and
CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate
improvements of up to 30% in critical cognitive tasks, underscoring the
effectiveness of incorporating advanced reasoning capabilities into UAV control
systems. Our contributions include the development of a state-of-the-art VLA
model for UAV control and the introduction of the first dedicated benchmark for
assessing cognitive tasks in drone operations. The complete repository is
available at cognitivedrone.github.ioSummary
AI-Generated Summary