機器人效用模型：零樣本部署於新環境的一般政策

摘要

機器人模型，特別是那些使用大量數據訓練的模型，最近展示了豐富的現實世界操作和導航能力。幾個獨立的努力表明，在環境中提供足夠的訓練數據後，機器人策略可以推廣到該環境中展示的變化。然而，需要對每個新環境進行微調的機器人模型與語言或視覺模型形成鮮明對比，後者可以零-shot部署用於開放世界問題。在這項工作中，我們提出了機器人效用模型（RUMs），這是一個用於訓練和部署零-shot機器人策略的框架，可以直接推廣到新環境而無需進行任何微調。為了有效地創建RUMs，我們開發了新工具，可以快速收集移動操作任務的數據，將這些數據與多模態模仿學習的策略相整合，並在Hello Robot Stretch這種廉價的商品機器人上部署策略，並配備外部mLLM驗證器進行重試。我們訓練了五個這樣的效用模型，用於打開櫥櫃門、打開抽屜、撿起餐巾、撿起紙袋和重新定位倒下的物體。我們的系統在與未見過的物體互動的未知新環境中，平均實現了90%的成功率。此外，這些效用模型還可以在不需要進一步數據、訓練或微調的情況下成功應對不同的機器人和攝像機設置。我們的經驗教訓中，培訓數據的重要性超過了培訓算法和策略類別，指導數據縮放的必要性，多樣而高質量示範的必要性，以及改進個別環境性能的機器人內省和重試的方法。我們的代碼、數據、模型、硬件設計，以及我們的實驗和部署視頻均為開源，可在我們的項目網站上找到：https://robotutilitymodels.com

English

Robot models, particularly those trained with large amounts of data, have recently shown a plethora of real-world manipulation and navigation capabilities. Several independent efforts have shown that given sufficient training data in an environment, robot policies can generalize to demonstrated variations in that environment. However, needing to finetune robot models to every new environment stands in stark contrast to models in language or vision that can be deployed zero-shot for open-world problems. In this work, we present Robot Utility Models (RUMs), a framework for training and deploying zero-shot robot policies that can directly generalize to new environments without any finetuning. To create RUMs efficiently, we develop new tools to quickly collect data for mobile manipulation tasks, integrate such data into a policy with multi-modal imitation learning, and deploy policies on-device on Hello Robot Stretch, a cheap commodity robot, with an external mLLM verifier for retrying. We train five such utility models for opening cabinet doors, opening drawers, picking up napkins, picking up paper bags, and reorienting fallen objects. Our system, on average, achieves 90% success rate in unseen, novel environments interacting with unseen objects. Moreover, the utility models can also succeed in different robot and camera set-ups with no further data, training, or fine-tuning. Primary among our lessons are the importance of training data over training algorithm and policy class, guidance about data scaling, necessity for diverse yet high-quality demonstrations, and a recipe for robot introspection and retrying to improve performance on individual environments. Our code, data, models, hardware designs, as well as our experiment and deployment videos are open sourced and can be found on our project website: https://robotutilitymodels.com

機器人效用模型：零樣本部署於新環境的一般政策

Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

摘要

Support