机器人实用模型：零样本部署在新环境中的一般政策

摘要

机器人模型，尤其是那些经过大量数据训练的模型，最近展示了丰富的现实世界操作和导航能力。几项独立的努力表明，在环境中提供足够的训练数据后，机器人策略可以推广到该环境中展示的变化。然而，需要对每个新环境进行微调的机器人模型与语言或视觉模型形成鲜明对比，后者可以零射击部署用于开放世界问题。在这项工作中，我们提出了机器人效用模型（RUMs），这是一个用于训练和部署零射击机器人策略的框架，可以直接推广到新环境而无需任何微调。为了高效创建RUMs，我们开发了新工具，快速收集移动操作任务数据，将这些数据与多模仿学习策略相结合，并在Hello Robot Stretch等廉价商品机器人上部署策略，配备外部mLLM验证器以进行重试。我们为打开橱柜门、打开抽屉、拾取餐巾、拾取纸袋和重新定位倒下的物体训练了五种这样的效用模型。我们的系统平均在与未见过的物体互动的未见过的新环境中实现了90%的成功率。此外，这些效用模型还可以在不需要进一步数据、训练或微调的情况下成功应对不同的机器人和摄像头设置。我们的主要经验教训包括训练数据的重要性高于训练算法和策略类别，有关数据缩放的指导，对多样化但高质量演示的必要性，以及改进个别环境性能的机器人内省和重试配方。我们的代码、数据、模型、硬件设计，以及我们的实验和部署视频均已开源，可在我们的项目网站上找到：https://robotutilitymodels.com

English

Robot models, particularly those trained with large amounts of data, have recently shown a plethora of real-world manipulation and navigation capabilities. Several independent efforts have shown that given sufficient training data in an environment, robot policies can generalize to demonstrated variations in that environment. However, needing to finetune robot models to every new environment stands in stark contrast to models in language or vision that can be deployed zero-shot for open-world problems. In this work, we present Robot Utility Models (RUMs), a framework for training and deploying zero-shot robot policies that can directly generalize to new environments without any finetuning. To create RUMs efficiently, we develop new tools to quickly collect data for mobile manipulation tasks, integrate such data into a policy with multi-modal imitation learning, and deploy policies on-device on Hello Robot Stretch, a cheap commodity robot, with an external mLLM verifier for retrying. We train five such utility models for opening cabinet doors, opening drawers, picking up napkins, picking up paper bags, and reorienting fallen objects. Our system, on average, achieves 90% success rate in unseen, novel environments interacting with unseen objects. Moreover, the utility models can also succeed in different robot and camera set-ups with no further data, training, or fine-tuning. Primary among our lessons are the importance of training data over training algorithm and policy class, guidance about data scaling, necessity for diverse yet high-quality demonstrations, and a recipe for robot introspection and retrying to improve performance on individual environments. Our code, data, models, hardware designs, as well as our experiment and deployment videos are open sourced and can be found on our project website: https://robotutilitymodels.com

机器人实用模型：零样本部署在新环境中的一般政策

Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

摘要

Support