ChatPaper.aiChatPaper

通过值函数预训练从互联网视频中进行机器人离线强化学习

Robotic Offline RL from Internet Videos via Value-Function Pre-Training

September 22, 2023
作者: Chethan Bhateja, Derek Guo, Dibya Ghosh, Anikait Singh, Manan Tomar, Quan Vuong, Yevgen Chebotar, Sergey Levine, Aviral Kumar
cs.AI

摘要

在互联网数据上进行预训练已被证明是许多现代机器学习系统实现广泛泛化的关键因素。要在机器人强化学习(RL)中实现这种能力,需要做些什么?离线RL方法从机器人经验数据集中学习,为将先前数据整合到机器人学习流程中提供了一种方法。然而,这些方法与视频数据(如Ego4D)存在“类型不匹配”,这是机器人技术可用的最大先前数据集,因为视频只提供观察经验,缺乏RL方法所需的动作或奖励注释。在本文中,我们开发了一个系统,完全基于通过时间差分学习学习价值函数,以在机器人离线RL中利用大规模人类视频数据集。我们展示了在视频数据集上进行价值学习可以学习到比其他从视频数据中学习方法更有利于下游机器人离线RL的表示。我们的系统名为V-PTR,结合了在视频数据上进行预训练和在多样化机器人数据上进行训练的机器人离线RL方法的优势,从而产生了更好、更稳健、更广泛泛化的操纵任务的价值函数和策略。在一个真实的WidowX机器人上进行的几个操纵任务中,我们的框架生成的策略明显优于先前的方法。我们的视频和更多细节可在https://dibyaghosh.com/vptr/找到。
English
Pre-training on Internet data has proven to be a key ingredient for broad generalization in many modern ML systems. What would it take to enable such capabilities in robotic reinforcement learning (RL)? Offline RL methods, which learn from datasets of robot experience, offer one way to leverage prior data into the robotic learning pipeline. However, these methods have a "type mismatch" with video data (such as Ego4D), the largest prior datasets available for robotics, since video offers observation-only experience without the action or reward annotations needed for RL methods. In this paper, we develop a system for leveraging large-scale human video datasets in robotic offline RL, based entirely on learning value functions via temporal-difference learning. We show that value learning on video datasets learns representations that are more conducive to downstream robotic offline RL than other approaches for learning from video data. Our system, called V-PTR, combines the benefits of pre-training on video data with robotic offline RL approaches that train on diverse robot data, resulting in value functions and policies for manipulation tasks that perform better, act robustly, and generalize broadly. On several manipulation tasks on a real WidowX robot, our framework produces policies that greatly improve over prior methods. Our video and additional details can be found at https://dibyaghosh.com/vptr/
PDF80December 15, 2024