MuJoCo-Drones-Gym:面向控制与强化学习的GPU加速多无人机模拟器
MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning
June 6, 2026
作者: Manan Tayal
cs.AI
摘要
机器人仿真器是空中机器人现代研究的基石,既可作为新型控制算法开发的载体,也可作为强化学习策略训练的数据来源。然而,现有四旋翼学习环境常常在物理保真度、多智能体支持以及现代深度强化学习管线所需的吞吐量之间面临权衡。本文提出MuJoCo-Drones-Gym——一个基于MuJoCo物理引擎构建的开源、兼容Gymnasium的多无人机环境。MuJoCo-Drones-Gym支持任意数量的Bitcraze Crazyflie 2.x纳米四旋翼,并提供了模块化API,可选择:(i)物理模型(刚体MuJoCo、显式Python动力学,或地面效应、桨叶阻力、无人机间下洗流的任意子集);(ii)动作接口(每个电机的转速、集体归一化推力、速度设定点或PID航点指令);(iii)观测空间(运动状态向量、RGB/深度/分割相机或邻域邻接信息)。借助PettingZoo ParallelEnv封装,可直接用于多智能体强化学习;而一套包含悬停、速度跟踪、多无人机悬停、航点导航、编队飞行、穿越门竞速及通用多智能体模板共七个任务环境,展示了该接口的广泛适用性。我们描述了环境设计、底层物理与四旋翼动力学,并通过控制与学习示例(与密切相关的gym-pybullet-drones项目相似,但利用了MuJoCo更优的接触处理、渲染与并行化能力)阐明其用法。
English
Robotic simulators are a cornerstone of modern research in aerial robotics, serving both as a vehicle for the development of new control algorithms and as the data source for training reinforcement learning (RL) policies. Yet, existing quadcopter learning environments often face a trade-off between physical fidelity, multi-agent support, and the throughput required by modern deep RL pipelines. In this paper, we present MuJoCo-Drones-Gym, an open-source Gymnasium-compatible multi-drone environment built on top of the MuJoCo physics engine. MuJoCo-Drones-Gym supports an arbitrary number of Bitcraze Crazyflie 2.x nano-quadcopters and exposes a modular API for selecting (i)~the physics model (rigid-body MuJoCo, explicit Python dynamics, or any subset of ground effect, blade drag, and inter-drone downwash), (ii)~the action interface (per-motor RPMs, collective normalized thrust, velocity setpoints, or PID waypoint commands), and (iii)~the observation space (kinematic state vectors, RGB / depth / segmentation cameras, or neighbourhood adjacency information). A PettingZoo ParallelEnv wrapper enables drop-in multi-agent reinforcement learning, while a suite of seven task environments, hover, velocity tracking, multi-drone hover, waypoint navigation, formation flight, gate racing, and a generic multi-agent template, demonstrates the breadth of the interface. We describe the environment design, the underlying physics and quadcopter dynamics, and illustrate its use through control and learning examples that mirror those of the closely related gym-pybullet-drones project, while taking advantage of MuJoCo's improved contact handling, rendering, and parallelizability.