VoxPoser:可组合的三维价值地图用于语言模型的机器人操作
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
July 12, 2023
作者: Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei
cs.AI
摘要
大型语言模型(LLMs)被证明拥有丰富的可操作知识,可以以推理和规划的形式提取出来,用于机器人操作。尽管取得了进展,但大多数仍然依赖预定义的运动基元来执行与环境的物理交互,这仍然是一个主要瓶颈。在这项工作中,我们的目标是合成机器人轨迹,即一系列密集的6自由度末端执行器路径点,用于各种操作任务,给定一组开放的指令和一组开放的物体。我们首先观察到,LLMs擅长根据自由形式的语言指令推断可供性和约束。更重要的是,通过利用它们的编码能力,它们可以与视觉语言模型(VLM)互动,以组合3D值图,将知识落实到代理的观察空间中。然后,将组合的值图用于基于模型的规划框架,以零样本合成对动态扰动具有鲁棒性的闭环机器人轨迹。我们进一步展示了所提出的框架如何从在线经验中受益,通过有效学习涉及接触丰富交互的场景的动力学模型。我们在模拟和真实机器人环境中进行了大规模研究,展示了能够执行各种自由形式自然语言规定的日常操作任务的能力。项目网站:https://voxposer.github.io
English
Large language models (LLMs) are shown to possess a wealth of actionable
knowledge that can be extracted for robot manipulation in the form of reasoning
and planning. Despite the progress, most still rely on pre-defined motion
primitives to carry out the physical interactions with the environment, which
remains a major bottleneck. In this work, we aim to synthesize robot
trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a
large variety of manipulation tasks given an open-set of instructions and an
open-set of objects. We achieve this by first observing that LLMs excel at
inferring affordances and constraints given a free-form language instruction.
More importantly, by leveraging their code-writing capabilities, they can
interact with a visual-language model (VLM) to compose 3D value maps to ground
the knowledge into the observation space of the agent. The composed value maps
are then used in a model-based planning framework to zero-shot synthesize
closed-loop robot trajectories with robustness to dynamic perturbations. We
further demonstrate how the proposed framework can benefit from online
experiences by efficiently learning a dynamics model for scenes that involve
contact-rich interactions. We present a large-scale study of the proposed
method in both simulated and real-robot environments, showcasing the ability to
perform a large variety of everyday manipulation tasks specified in free-form
natural language. Project website: https://voxposer.github.io