観察と相互作用からの計画立案

要旨

観察学習は、エージェントが実行されたタスクの観測のみを参照してタスクの実行方法を学習することを要求する。本研究は、手設計された報酬や実演者の行動へのアクセスが想定されない、実世界のロボット学習における同等の設定を調査する。このデータ制約のある設定に対処するため、本研究は、観測と相互作用のみから世界モデリングを行うための、計画ベースの逆強化学習（IRL）アルゴリズムを提案する。実世界で完全に行われた実験により、このパラダイムが、事前知識、事前学習、またはタスク観測を超えるあらゆる種類のデータを想定せずに、1時間未満で画像ベースのマニピュレーションタスクをゼロから学習するのに有効であることが実証された。さらに、学習された世界モデルの表現が、実世界でゼロからオンライン転移学習を行う能力を有することを示す。IRL、RL、行動クローニング（BC）を含む、より制限的な仮定を持つ既存のアプローチと比較して、提案手法は大幅に優れたサンプル効率と成功率を示し、観測と相互作用からのオンライン世界モデリングと計画への実用的な道筋を可能にする。動画と詳細は：https://uwrobotlearning.github.io/mpail2/

English

Observational learning requires an agent to learn to perform a task by referencing only observations of the performed task. This work investigates the equivalent setting in real-world robot learning where access to hand-designed rewards and demonstrator actions are not assumed. To address this data-constrained setting, this work presents a planning-based Inverse Reinforcement Learning (IRL) algorithm for world modeling from observation and interaction alone. Experiments conducted entirely in the real-world demonstrate that this paradigm is effective for learning image-based manipulation tasks from scratch in under an hour, without assuming prior knowledge, pre-training, or data of any kind beyond task observations. Moreover, this work demonstrates that the learned world model representation is capable of online transfer learning in the real-world from scratch. In comparison to existing approaches, including IRL, RL, and Behavior Cloning (BC), which have more restrictive assumptions, the proposed approach demonstrates significantly greater sample efficiency and success rates, enabling a practical path forward for online world modeling and planning from observation and interaction. Videos and more at: https://uwrobotlearning.github.io/mpail2/.

観察と相互作用からの計画立案

Planning from Observation and Interaction

要旨

Support