비전 기반 추적-회피 로봇 정책 학습

초록

실제 환경에서의 제약 조건 하에서 추적-회피 상호작용과 같은 전략적 로봇 행동을 학습하는 것은 매우 어려운 과제입니다. 이는 상호작용의 역학을 활용하고, 물리적 상태와 잠재적 의도의 불확실성을 모두 고려한 계획을 필요로 합니다. 본 논문에서는 이 해결하기 어려운 문제를 지도 학습 문제로 변환하여, 완전 관측 가능한 로봇 정책이 부분 관측 가능한 정책을 위한 지도 신호를 생성하도록 합니다. 우리는 부분 관측 가능한 추적자 정책에 대한 지도 신호의 품질이 두 가지 핵심 요소에 의존한다는 것을 발견했습니다: 회피자의 행동 다양성과 최적성의 균형, 그리고 완전 관측 가능한 정책에서의 모델링 가정의 강도입니다. 우리는 이 정책을 RGB-D 카메라가 장착된 물리적 4족 보행 로봇에 배포하여 실제 환경에서의 추적-회피 상호작용을 실험했습니다. 모든 어려움에도 불구하고, 센싱 제약은 창의성을 불러일으켰습니다: 로봇은 불확실할 때 정보를 수집하고, 노이즈가 있는 측정값에서 의도를 예측하며, 가로채기 위해 미리 예측하도록 유도되었습니다. 프로젝트 웹페이지: https://abajcsy.github.io/vision-based-pursuit/

English

Learning strategic robot behavior -- like that required in pursuit-evasion interactions -- under real-world constraints is extremely challenging. It requires exploiting the dynamics of the interaction, and planning through both physical state and latent intent uncertainty. In this paper, we transform this intractable problem into a supervised learning problem, where a fully-observable robot policy generates supervision for a partially-observable one. We find that the quality of the supervision signal for the partially-observable pursuer policy depends on two key factors: the balance of diversity and optimality of the evader's behavior and the strength of the modeling assumptions in the fully-observable policy. We deploy our policy on a physical quadruped robot with an RGB-D camera on pursuit-evasion interactions in the wild. Despite all the challenges, the sensing constraints bring about creativity: the robot is pushed to gather information when uncertain, predict intent from noisy measurements, and anticipate in order to intercept. Project webpage: https://abajcsy.github.io/vision-based-pursuit/

비전 기반 추적-회피 로봇 정책 학습

Learning Vision-based Pursuit-Evasion Robot Policies

초록

Support