ChatPaper.aiChatPaper

CognitiveDrone:一個用於無人機即時認知任務解決與推理的VLA模型及評估基準

CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

March 3, 2025
作者: Artem Lykov, Valerii Serpiva, Muhammad Haris Khan, Oleg Sautenkov, Artyom Myshlyaev, Grik Tadevosyan, Yasheerah Yaqoot, Dzmitry Tsetserukou
cs.AI

摘要

本文介紹了CognitiveDrone,這是一種專為需要高級認知能力的複雜無人機(UAV)任務量身定制的新型視覺-語言-動作(VLA)模型。該模型基於包含超過8,000條模擬飛行軌跡的數據集進行訓練,涵蓋三個關鍵類別——人類識別、符號理解與推理——並根據第一人稱視覺輸入和文本指令生成實時4D動作命令。為進一步提升在複雜場景中的表現,我們提出了CognitiveDrone-R1,該版本整合了額外的視覺-語言模型(VLM)推理模塊,以在高頻控制前簡化任務指令。使用我們開源的基準測試CognitiveDroneBench進行的實驗評估顯示,儘管以競速為導向的模型(RaceVLA)的總體成功率為31.3%,但基礎版CognitiveDrone模型達到了59.6%,而CognitiveDrone-R1則實現了77.2%的成功率。這些結果表明,在關鍵認知任務中,性能提升高達30%,突顯了將高級推理能力融入無人機控制系統的有效性。我們的貢獻包括開發了一種用於無人機控制的先進VLA模型,並引入了首個專門用於評估無人機操作中認知任務的基準測試。完整資源庫可在cognitivedrone.github.io獲取。
English
This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA) model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories-Human Recognition, Symbol Understanding, and Reasoning-the model generates real-time 4D action commands based on first-person visual inputs and textual instructions. To further enhance performance in intricate scenarios, we propose CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM) reasoning module to simplify task directives prior to high-frequency control. Experimental evaluations using our open-source benchmark, CognitiveDroneBench, reveal that while a racing-oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UAV control systems. Our contributions include the development of a state-of-the-art VLA model for UAV control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations. The complete repository is available at cognitivedrone.github.io
PDF32March 6, 2025