ChatPaper.aiChatPaper

PEEK:引導式與極簡圖像表徵用於機器人操作策略的零樣本泛化

PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies

September 22, 2025
作者: Jesse Zhang, Marius Memmel, Kevin Kim, Dieter Fox, Jesse Thomason, Fabio Ramos, Erdem Bıyık, Abhishek Gupta, Anqi Li
cs.AI

摘要

機器人操作策略往往難以泛化,因為它們必須同時學習關注何處、採取何種動作以及如何執行這些動作。我們認為,關於「何處」和「何種」的高層次推理可以交由視覺語言模型(VLMs)處理,讓策略專注於「如何」行動。我們提出了PEEK(策略無關的關鍵點提取),它通過微調VLMs來預測一個統一的基於點的中間表示:1. 指定「何種」動作的末端執行器路徑,以及2. 指示「何處」關注的任務相關遮罩。這些註解直接疊加在機器人觀測上,使得該表示與策略無關,並可跨架構轉移。為了實現可擴展的訓練,我們引入了一個自動註解管道,在涵蓋9種實體的20多個機器人數據集上生成標記數據。在現實世界的評估中,PEEK持續提升了零樣本泛化能力,包括僅在模擬中訓練的3D策略在現實世界中提升了41.4倍,以及大型VLAs和小型操作策略分別提升了2-3.5倍。通過讓VLMs吸收語義和視覺的複雜性,PEEK為操作策略提供了所需的最小提示——何處、何種以及如何。網站位於https://peek-robot.github.io/。
English
Robotic manipulation policies often fail to generalize because they must simultaneously learn where to attend, what actions to take, and how to execute them. We argue that high-level reasoning about where and what can be offloaded to vision-language models (VLMs), leaving policies to specialize in how to act. We present PEEK (Policy-agnostic Extraction of Essential Keypoints), which fine-tunes VLMs to predict a unified point-based intermediate representation: 1. end-effector paths specifying what actions to take, and 2. task-relevant masks indicating where to focus. These annotations are directly overlaid onto robot observations, making the representation policy-agnostic and transferable across architectures. To enable scalable training, we introduce an automatic annotation pipeline, generating labeled data across 20+ robot datasets spanning 9 embodiments. In real-world evaluations, PEEK consistently boosts zero-shot generalization, including a 41.4x real-world improvement for a 3D policy trained only in simulation, and 2-3.5x gains for both large VLAs and small manipulation policies. By letting VLMs absorb semantic and visual complexity, PEEK equips manipulation policies with the minimal cues they need--where, what, and how. Website at https://peek-robot.github.io/.
PDF12September 24, 2025