MuJoCo-Drones-Gym: 制御と強化学習のためのGPU加速マルチドローンシミュレータ

要旨

ロボットシミュレータは、空中ロボティクスにおける現代研究の要であり、新たな制御アルゴリズムの開発手段として、また強化学習ポリシーを訓練するためのデータソースとして機能している。しかし、既存のクワッドコプター学習環境では、物理的忠実度、マルチエージェント対応、そして現代の深層強化学習パイプラインに求められるスループットの間で、しばしばトレードオフが生じている。本論文では、MuJoCo物理エンジンをベースに構築された、オープンソースのGymnasium互換マルチドローン環境「MuJoCo-Drones-Gym」を紹介する。MuJoCo-Drones-Gymは、任意の台数のBitcraze Crazyflie 2.xナノクワッドコプターをサポートし、以下の選択を可能とするモジュラーAPIを提供する：(i) 物理モデル（剛体MuJoCo、明示的なPython動特性、または地面効果、ブレード抗力、ドローン間ダウンウォッシュの任意のサブセット）、(ii) アクションインターフェース（モーター毎のRPM、正規化された集合推力を用いるアクション空間）※、速度設定値、またはPIDウェイポイント指令）、(iii) 観測空間（運動学的状態ベクトル、RGB/深度/セグメンテーションカメラ、または近傍隣接情報）。PettingZoo ParallelEnvラッパーにより、ドロップインでのマルチエージェント強化学習が可能となる。さらに、ホバリング、速度追跡、マルチドローン制御空中ロボティクス研究における現代の要であり、新しい制御アルゴリズムの開発手段と強化学習ポリシー訓練のデータソースとして機能している。しかし、既存のクワッドコプター学習環境は、物理的忠実度、マルチエージェント対応、そして現代の深層強化学習パイプラインに必要なスループットの間で、しばしばトレードオフを抱えている。本論文では、MuJoCo物理エンジンを基盤としたオープンソースのGymnasium互換マルチドローン環境「MuJoCo-Drones-Gym」を提案する。本環境は任意の台数のBitcraze Crazyflie 2.xナノクワッドコプターをサポートし、以下の項目を選択可能なモジュラーAPIを備える：(i) 物理モデル（剛体MuJoCo、明示的Python動特性、または地面効果・ブレード抗力・ドローン間ダウンウォッシュの任意の組み合わせ）、(ii) アクションインターフェース（モーター毎のRPM、正規化された集合推力、速度設定値、PIDウェイポイント指令）、(iii) 観測空間（運動学的状態ベクトル、RGB/深度/セグメンテーションカメラ、近傍隣接情報）。PettingZoo ParallelEnvラッパーにより容易なマルチエージェント強化学習が可能であり、7つのタスク環境（ホバリング、速度追跡、マルチドローン制御～ホバリング、速度追跡、マルチドローンホバリング、ウェイポイントナビゲーション、フォーメーション飛行、ゲートレース、汎用マルチエージェントテンプレート）によって、インターフェースの広範な適用可能性を示す。本稿では、環境設計、基礎となる物理およびクワッドコプター動特性を説明し、関連性の高いgym-pybullet-dronesプロジェクトと類似した制御および学習の例を通じてその利用法を示すとともに、MuJoCoの改良された接触処理、レンダリング、並列化性能を活用している。

English

Robotic simulators are a cornerstone of modern research in aerial robotics, serving both as a vehicle for the development of new control algorithms and as the data source for training reinforcement learning (RL) policies. Yet, existing quadcopter learning environments often face a trade-off between physical fidelity, multi-agent support, and the throughput required by modern deep RL pipelines. In this paper, we present MuJoCo-Drones-Gym, an open-source Gymnasium-compatible multi-drone environment built on top of the MuJoCo physics engine. MuJoCo-Drones-Gym supports an arbitrary number of Bitcraze Crazyflie 2.x nano-quadcopters and exposes a modular API for selecting (i)~the physics model (rigid-body MuJoCo, explicit Python dynamics, or any subset of ground effect, blade drag, and inter-drone downwash), (ii)~the action interface (per-motor RPMs, collective normalized thrust, velocity setpoints, or PID waypoint commands), and (iii)~the observation space (kinematic state vectors, RGB / depth / segmentation cameras, or neighbourhood adjacency information). A PettingZoo ParallelEnv wrapper enables drop-in multi-agent reinforcement learning, while a suite of seven task environments, hover, velocity tracking, multi-drone hover, waypoint navigation, formation flight, gate racing, and a generic multi-agent template, demonstrates the breadth of the interface. We describe the environment design, the underlying physics and quadcopter dynamics, and illustrate its use through control and learning examples that mirror those of the closely related gym-pybullet-drones project, while taking advantage of MuJoCo's improved contact handling, rendering, and parallelizability.