學習基於視覺的追逐-逃避機器人策略

摘要

學習策略性機器人行為，例如在追逐-逃避互動中所需的行為，在現實世界的限制下是極具挑戰性的。這需要利用互動的動態，並通過物理狀態和潛在意圖的不確定性進行規劃。在本文中，我們將這個棘手的問題轉化為一個監督式學習問題，其中一個完全可觀察的機器人策略為部分可觀察的機器人生成監督。我們發現，對於部分可觀察的追逐者策略來說，監督信號的質量取決於兩個關鍵因素：逃避者行為的多樣性和最優性的平衡，以及完全可觀察策略中建模假設的強度。我們在一個具有RGB-D攝像頭的四足機器人上部署我們的策略，進行野外追逐-逃避互動。儘管存在種種挑戰，感知限制帶來創造力：當不確定時，機器人被迫收集信息，從噪聲測量中預測意圖，並預測以進行截擊。專案網頁：https://abajcsy.github.io/vision-based-pursuit/

English

Learning strategic robot behavior -- like that required in pursuit-evasion interactions -- under real-world constraints is extremely challenging. It requires exploiting the dynamics of the interaction, and planning through both physical state and latent intent uncertainty. In this paper, we transform this intractable problem into a supervised learning problem, where a fully-observable robot policy generates supervision for a partially-observable one. We find that the quality of the supervision signal for the partially-observable pursuer policy depends on two key factors: the balance of diversity and optimality of the evader's behavior and the strength of the modeling assumptions in the fully-observable policy. We deploy our policy on a physical quadruped robot with an RGB-D camera on pursuit-evasion interactions in the wild. Despite all the challenges, the sensing constraints bring about creativity: the robot is pushed to gather information when uncertain, predict intent from noisy measurements, and anticipate in order to intercept. Project webpage: https://abajcsy.github.io/vision-based-pursuit/

學習基於視覺的追逐-逃避機器人策略

Learning Vision-based Pursuit-Evasion Robot Policies

摘要

Support