引导通用型模型：通过价值指导改进机器人基础模型

摘要

基于多样化示范数据集训练的大型通用机器人策略已被证明在控制各种机器人在不同场景中以及获取广泛的操作技能方面非常有效。然而，这类策略训练所使用的数据通常质量参差不齐 —— 人类收集的示范不太可能完美执行任务，而且数据集越大，筛选出最高质量示例就越困难。另外，目前仍不清楚一个实体的最佳数据对于在另一个实体上训练的效果如何。本文提出了一种通用且广泛适用的方法，在部署时通过根据离线强化学习学习的价值函数重新排列其动作，以提升这类通用机器人策略的性能。这种方法被称为价值引导策略引导（V-GPS），适用于各种不同的通用策略，无需微调甚至访问策略的权重。我们展示了相同的价值函数如何提高五种不同架构的最新策略的性能，即使它们是在不同数据集上训练的，也在12个任务的多个机器人平台上实现了一致的性能改进。代码和视频可在以下网址找到：https://nakamotoo.github.io/V-GPS

English

Large, general-purpose robotic policies trained on diverse demonstration datasets have been shown to be remarkably effective both for controlling a variety of robots in a range of different scenes, and for acquiring broad repertoires of manipulation skills. However, the data that such policies are trained on is generally of mixed quality -- not only are human-collected demonstrations unlikely to perform the task perfectly, but the larger the dataset is, the harder it is to curate only the highest quality examples. It also remains unclear how optimal data from one embodiment is for training on another embodiment. In this paper, we present a general and broadly applicable approach that enhances the performance of such generalist robot policies at deployment time by re-ranking their actions according to a value function learned via offline RL. This approach, which we call Value-Guided Policy Steering (V-GPS), is compatible with a wide range of different generalist policies, without needing to fine-tune or even access the weights of the policy. We show that the same value function can improve the performance of five different state-of-the-art policies with different architectures, even though they were trained on distinct datasets, attaining consistent performance improvement on multiple robotic platforms across a total of 12 tasks. Code and videos can be found at: https://nakamotoo.github.io/V-GPS

引导通用型模型：通过价值指导改进机器人基础模型

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance

摘要

Support