이산 시간 하이브리드 오토마타 학습: 다리형 보행과 스케이트보딩의 만남

초록

본 논문은 궤적 분할이나 이벤트 함수 학습 없이 모드 전환을 식별하고 실행하기 위해 온-폴리시 강화 학습을 사용하는 이산 시간 하이브리드 오토마타 학습(DHAL) 프레임워크를 소개합니다. 연속적인 흐름과 이산적인 모드 전환을 포함하는 하이브리드 동적 시스템은 다리형 로봇 보행과 같은 로보틱스 작업을 모델링할 수 있습니다. 모델 기반 방법은 일반적으로 사전 정의된 보행에 의존하는 반면, 모델 프리 접근법은 명시적인 모드 전환 지식을 결여하고 있습니다. 현재의 방법들은 연속적인 흐름을 회귀하기 전에 분할을 통해 이산 모드를 식별하지만, 궤적 레이블이나 분할 없이 고차원의 복잡한 강체 동역학을 학습하는 것은 해결되지 않은 어려운 문제입니다. 우리의 접근법은 접촉 유도 운동을 모델링하기 위해 베타 정책 분포와 멀티-크리틱 아키텍처를 통합하며, 이를 도전적인 사족 보행 로봇 스케이트보드 작업으로 예시합니다. 우리는 시뮬레이션과 실제 환경 테스트를 통해 이 방법을 검증하며, 하이브리드 동적 시스템에서의 견고한 성능을 입증합니다.

English

This paper introduces Discrete-time Hybrid Automata Learning (DHAL), a framework using on-policy Reinforcement Learning to identify and execute mode-switching without trajectory segmentation or event function learning. Hybrid dynamical systems, which include continuous flow and discrete mode switching, can model robotics tasks like legged robot locomotion. Model-based methods usually depend on predefined gaits, while model-free approaches lack explicit mode-switching knowledge. Current methods identify discrete modes via segmentation before regressing continuous flow, but learning high-dimensional complex rigid body dynamics without trajectory labels or segmentation is a challenging open problem. Our approach incorporates a beta policy distribution and a multi-critic architecture to model contact-guided motions, exemplified by a challenging quadrupedal robot skateboard task. We validate our method through simulations and real-world tests, demonstrating robust performance in hybrid dynamical systems.

이산 시간 하이브리드 오토마타 학습: 다리형 보행과 스케이트보딩의 만남

Discrete-Time Hybrid Automata Learning: Legged Locomotion Meets Skateboarding

초록

Support