观察与交互规划

摘要

观察学习要求智能体仅通过参考任务执行过程中的观测来掌握该技能。本研究探讨了现实世界机器人学习中的等效场景，即不预设人工设计的奖励函数和演示者动作。针对这种数据受限的情境，我们提出了一种基于规划的逆向强化学习算法，该算法仅通过观察与交互即可实现世界建模。完全在真实环境中进行的实验表明，该方法能够在一小时内从零开始学习基于图像的操控任务，且无需任何先验知识、预训练或任务观察之外的数据。此外，研究还证实所学得的世界模型表征具备在真实环境中从零开始进行在线迁移学习的能力。与包括逆向强化学习、强化学习和行为克隆在内的现有方法相比——这些方法均基于更严格的假设条件——本方案在样本利用效率和成功率上均展现出显著优势，为通过观察与交互实现在线世界建模与规划提供了可行路径。视频及更多内容详见：https://uwrobotlearning.github.io/mpail2/。

English

Observational learning requires an agent to learn to perform a task by referencing only observations of the performed task. This work investigates the equivalent setting in real-world robot learning where access to hand-designed rewards and demonstrator actions are not assumed. To address this data-constrained setting, this work presents a planning-based Inverse Reinforcement Learning (IRL) algorithm for world modeling from observation and interaction alone. Experiments conducted entirely in the real-world demonstrate that this paradigm is effective for learning image-based manipulation tasks from scratch in under an hour, without assuming prior knowledge, pre-training, or data of any kind beyond task observations. Moreover, this work demonstrates that the learned world model representation is capable of online transfer learning in the real-world from scratch. In comparison to existing approaches, including IRL, RL, and Behavior Cloning (BC), which have more restrictive assumptions, the proposed approach demonstrates significantly greater sample efficiency and success rates, enabling a practical path forward for online world modeling and planning from observation and interaction. Videos and more at: https://uwrobotlearning.github.io/mpail2/.

观察与交互规划

Planning from Observation and Interaction

摘要

Support