MuJoCo-Drones-Gym: 제어 및 강화 학습을 위한 GPU-가속 멀티 드론 시뮬레이터

초록

로봇 시뮬레이터는 현대 항공 로봇공학 연구의 초석으로, 새로운 제어 알고리즘 개발을 위한 도구이자 강화학습(RL) 정책 훈련을 위한 데이터 소스 역할을 수행합니다. 그러나 기존의 쿼드콥터 학습 환경은 종종 물리적 충실도, 다중 에이전트 지원, 그리고 현대 심층 RL 파이프라인에 요구되는 처리량 사이에서 절충을 강요받습니다. 본 논문에서는 MuJoCo 물리 엔진을 기반으로 구축된 오픈소스 Gymnasium 호환 다중 드론 환경인 MuJoCo-Drones-Gym을 소개합니다. MuJoCo-Drones-Gym은 임의 개수의 Bitcraze Crazyflie 2.x 나노 쿼드콥터를 지원하며, (i) 물리 모델(강체 MuJoCo, 명시적 Python 동역학, 또는 지면 효과, 블레이드 항력, 드론 간 다운워시의 임의 부분 집합), (ii) 행동 인터페이스(모터별 RPM, 집합 정규화 추력, 속도 설정점, 또는 PID 웨이포인트 명령), (iii) 관측 공간(운동학적 상태 벡터, RGB/깊이/분할 카메라, 또는 이웃 인접 정보)을 선택할 수 있는 모듈식 API를 제공합니다. PettingZoo ParallelEnv 래퍼를 통해 드롭인 다중 에이전트 강화학습이 가능하며, 호버링, 속도 추적, 다중 드론 호버링, 웨이포인트 항법, 편대 비행, 게이트 레이싱, 일반 다중 에이전트 템플릿이라는 일곱 가지 과제 환경 모음이 인터페이스의 폭을 입증합니다. 본 논문에서는 환경 설계, 기반 물리 및 쿼드콥터 동역학을 설명하고, 밀접하게 관련된 gym-pybullet-drones 프로젝트의 예제를 반영하면서도 MuJoCo의 향상된 접촉 처리, 렌더링 및 병렬화 가능성을 활용한 제어 및 학습 예시를 통해 그 사용법을 보여줍니다.

English

Robotic simulators are a cornerstone of modern research in aerial robotics, serving both as a vehicle for the development of new control algorithms and as the data source for training reinforcement learning (RL) policies. Yet, existing quadcopter learning environments often face a trade-off between physical fidelity, multi-agent support, and the throughput required by modern deep RL pipelines. In this paper, we present MuJoCo-Drones-Gym, an open-source Gymnasium-compatible multi-drone environment built on top of the MuJoCo physics engine. MuJoCo-Drones-Gym supports an arbitrary number of Bitcraze Crazyflie 2.x nano-quadcopters and exposes a modular API for selecting (i)~the physics model (rigid-body MuJoCo, explicit Python dynamics, or any subset of ground effect, blade drag, and inter-drone downwash), (ii)~the action interface (per-motor RPMs, collective normalized thrust, velocity setpoints, or PID waypoint commands), and (iii)~the observation space (kinematic state vectors, RGB / depth / segmentation cameras, or neighbourhood adjacency information). A PettingZoo ParallelEnv wrapper enables drop-in multi-agent reinforcement learning, while a suite of seven task environments, hover, velocity tracking, multi-drone hover, waypoint navigation, formation flight, gate racing, and a generic multi-agent template, demonstrates the breadth of the interface. We describe the environment design, the underlying physics and quadcopter dynamics, and illustrate its use through control and learning examples that mirror those of the closely related gym-pybullet-drones project, while taking advantage of MuJoCo's improved contact handling, rendering, and parallelizability.