CognitiveDrone: リアルタイム認知タスク解決と推論のためのVLAモデルと評価ベンチマーク（UAV向け）

要旨

本論文では、高度な認知能力を必要とする複雑な無人航空機（UAV）タスクに特化した新しいVision-Language-Action（VLA）モデルであるCognitiveDroneを紹介する。このモデルは、人間認識、シンボル理解、推論の3つの主要カテゴリにわたる8,000以上のシミュレートされた飛行軌跡データセットで訓練され、一人称視点の視覚入力とテキスト指示に基づいてリアルタイムの4D動作コマンドを生成する。さらに複雑なシナリオでの性能を向上させるため、高頻度制御の前にタスク指示を簡素化する追加のVision-Language Model（VLM）推論モジュールを統合したCognitiveDrone-R1を提案する。オープンソースのベンチマークであるCognitiveDroneBenchを用いた実験的評価では、レース指向モデル（RaceVLA）が全体の成功率31.3%を達成するのに対し、基本のCognitiveDroneモデルは59.6%、CognitiveDrone-R1は77.2%の成功率を達成した。これらの結果は、重要な認知タスクにおいて最大30%の改善を示し、UAV制御システムに高度な推論能力を組み込むことの有効性を強調している。我々の貢献は、UAV制御のための最先端VLAモデルの開発と、ドローン操作における認知タスクを評価するための初の専用ベンチマークの導入を含む。完全なリポジトリはcognitivedrone.github.ioで公開されている。

English

This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA) model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories-Human Recognition, Symbol Understanding, and Reasoning-the model generates real-time 4D action commands based on first-person visual inputs and textual instructions. To further enhance performance in intricate scenarios, we propose CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM) reasoning module to simplify task directives prior to high-frequency control. Experimental evaluations using our open-source benchmark, CognitiveDroneBench, reveal that while a racing-oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UAV control systems. Our contributions include the development of a state-of-the-art VLA model for UAV control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations. The complete repository is available at cognitivedrone.github.io

CognitiveDrone: リアルタイム認知タスク解決と推論のためのVLAモデルと評価ベンチマーク（UAV向け）

CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

要旨

Support