基于观察与交互的规划

摘要

觀察式學習要求智能體僅通過參考任務執行的觀察結果來學習完成該任務。本研究探討了現實世界機器人學習中的等效場景，其中不假設存在手動設計的獎勵函數或示範者動作數據。針對這種數據受限的場景，本文提出一種基於規劃的逆向強化學習算法，該算法僅通過觀察與交互即可實現世界模型構建。完全在真實環境中進行的實驗表明，該範式能有效實現基於圖像的操控任務從零開始學習（耗時少於一小時），且無需預先知識、預訓練或除任務觀察外的任何數據。此外，本研究證實所學習的世界模型表徵能夠在真實世界中實現從零開始的在線遷移學習。與包括逆向強化學習、強化學習和行為克隆在內的其他方法相比（這些方法具有更嚴格的假設條件），本方法展現出顯著更高的樣本效率和成功率，為通過觀察與交互實現在線世界建模與規劃提供了可行路徑。視頻及更多內容請見：https://uwrobotlearning.github.io/mpail2/。

English

Observational learning requires an agent to learn to perform a task by referencing only observations of the performed task. This work investigates the equivalent setting in real-world robot learning where access to hand-designed rewards and demonstrator actions are not assumed. To address this data-constrained setting, this work presents a planning-based Inverse Reinforcement Learning (IRL) algorithm for world modeling from observation and interaction alone. Experiments conducted entirely in the real-world demonstrate that this paradigm is effective for learning image-based manipulation tasks from scratch in under an hour, without assuming prior knowledge, pre-training, or data of any kind beyond task observations. Moreover, this work demonstrates that the learned world model representation is capable of online transfer learning in the real-world from scratch. In comparison to existing approaches, including IRL, RL, and Behavior Cloning (BC), which have more restrictive assumptions, the proposed approach demonstrates significantly greater sample efficiency and success rates, enabling a practical path forward for online world modeling and planning from observation and interaction. Videos and more at: https://uwrobotlearning.github.io/mpail2/.

基于观察与交互的规划

Planning from Observation and Interaction

摘要

Support