MuJoCo-Drones-Gym：一个用于控制与强化学习的GPU加速多无人机仿真器

摘要

機器人模擬器是現代空中機器人研究的基石，不僅作為開發新控制演算法的工具，也作為訓練強化學習策略的資料來源。然而，現有的四旋翼學習環境往往需要在物理保真度、多智能體支援以及現代深度強化學習管線所需的吞吐量之間做出取捨。本文提出 MuJoCo-Drones-Gym，這是一個基於 MuJoCo 物理引擎的開源 Gymnasium 相容多無人機環境。MuJoCo-Drones-Gym 支援任意數量的 Bitcraze Crazyflie 2.x 奈米四旋翼無人機，並提供模組化 API，可供選擇 (i) 物理模型（剛體 MuJoCo、顯式 Python 動力學，或地面效應、葉片阻力及無人機間下洗流的任意子集）、(ii) 動作介面（各馬達轉速、集體歸一化推力、速度指令點、或 PID 路徑點指令），以及 (iii) 觀測空間（運動狀態向量、RGB/深度/分割相機影像，或鄰域鄰接資訊）。PettingZoo ParallelEnv 包裝器可實現即插即用的多智能體強化學習，而一系列七個任務環境——懸停、速度追蹤、多無人機懸停、路徑點導航、編隊飛行、閘道競速，以及一個通用多智能體模板——則展示了該介面的廣泛性。我們描述了環境設計、底層物理與四旋翼動力學，並透過與密切相關的 gym-pybullet-drones 專案相似的控制與學習範例來說明其應用，同時充分利用了 MuJoCo 在接觸處理、渲染與平行化上的改進。

English

Robotic simulators are a cornerstone of modern research in aerial robotics, serving both as a vehicle for the development of new control algorithms and as the data source for training reinforcement learning (RL) policies. Yet, existing quadcopter learning environments often face a trade-off between physical fidelity, multi-agent support, and the throughput required by modern deep RL pipelines. In this paper, we present MuJoCo-Drones-Gym, an open-source Gymnasium-compatible multi-drone environment built on top of the MuJoCo physics engine. MuJoCo-Drones-Gym supports an arbitrary number of Bitcraze Crazyflie 2.x nano-quadcopters and exposes a modular API for selecting (i)~the physics model (rigid-body MuJoCo, explicit Python dynamics, or any subset of ground effect, blade drag, and inter-drone downwash), (ii)~the action interface (per-motor RPMs, collective normalized thrust, velocity setpoints, or PID waypoint commands), and (iii)~the observation space (kinematic state vectors, RGB / depth / segmentation cameras, or neighbourhood adjacency information). A PettingZoo ParallelEnv wrapper enables drop-in multi-agent reinforcement learning, while a suite of seven task environments, hover, velocity tracking, multi-drone hover, waypoint navigation, formation flight, gate racing, and a generic multi-agent template, demonstrates the breadth of the interface. We describe the environment design, the underlying physics and quadcopter dynamics, and illustrate its use through control and learning examples that mirror those of the closely related gym-pybullet-drones project, while taking advantage of MuJoCo's improved contact handling, rendering, and parallelizability.