CAJun: 학습된 중심 제어기를 활용한 연속 적응형 점프

초록

우리는 다리형 로봇이 적응형 점프 거리로 연속적으로 점프할 수 있게 하는 새로운 계층적 학습 및 제어 프레임워크인 CAJun을 제안합니다. CAJun은 상위 수준의 중심 질량 정책과 하위 수준의 다리 제어기로 구성됩니다. 특히, 우리는 강화 학습(RL)을 사용하여 다리 제어기에 대한 보행 타이밍, 베이스 속도, 스윙 발 위치를 지정하는 중심 질량 정책을 학습시킵니다. 다리 제어기는 최적 제어를 사용하여 스윙 발 목표와 베이스 속도 명령을 추적하기 위해 보행 타이밍에 따라 스윙 다리와 스탠스 다리의 모터 명령을 최적화합니다. 또한, 우리는 정책 학습 속도를 한 차원 빠르게 하기 위해 다리 제어기의 스탠스 다리 최적화기를 재구성했습니다. 우리의 시스템은 학습의 다양성과 최적 제어의 견고성을 결합합니다. RL과 최적 제어 방법을 결합함으로써, 우리의 시스템은 학습의 다양성을 달성하면서 제어 방법의 견고성을 누릴 수 있어 실제 로봇으로 쉽게 전환할 수 있습니다. 우리는 단일 GPU에서 20분의 학습 후 CAJun이 Go1 로봇에서 시뮬레이션과 실제 간의 작은 차이로 적응형 거리의 연속적인 긴 점프를 달성할 수 있음을 보여줍니다. 또한, 로봇은 최대 70cm의 간격을 뛰어넘을 수 있으며, 이는 기존 방법보다 40% 이상 넓은 수치입니다.

English

We present CAJun, a novel hierarchical learning and control framework that enables legged robots to jump continuously with adaptive jumping distances. CAJun consists of a high-level centroidal policy and a low-level leg controller. In particular, we use reinforcement learning (RL) to train the centroidal policy, which specifies the gait timing, base velocity, and swing foot position for the leg controller. The leg controller optimizes motor commands for the swing and stance legs according to the gait timing to track the swing foot target and base velocity commands using optimal control. Additionally, we reformulate the stance leg optimizer in the leg controller to speed up policy training by an order of magnitude. Our system combines the versatility of learning with the robustness of optimal control. By combining RL with optimal control methods, our system achieves the versatility of learning while enjoys the robustness from control methods, making it easily transferable to real robots. We show that after 20 minutes of training on a single GPU, CAJun can achieve continuous, long jumps with adaptive distances on a Go1 robot with small sim-to-real gaps. Moreover, the robot can jump across gaps with a maximum width of 70cm, which is over 40% wider than existing methods.

CAJun: 학습된 중심 제어기를 활용한 연속 적응형 점프

CAJun: Continuous Adaptive Jumping using a Learned Centroidal Controller

초록

Support