ChatPaper.aiChatPaper

InSight: 通过可操控VLA进行自主技能获取

InSight: Self-Guided Skill Acquisition via Steerable VLAs

June 23, 2026
作者: Maggie Wang, Lars Osterberg, Stephen Tian, Ola Shorinwa, Jiajun Wu, Mac Schwager
cs.AI

摘要

视觉-语言-动作(VLA)模型能够从示范中学习操作技能,但其能力受限于训练数据中的技能范围。我们提出InSight框架,通过在基元动作层面(例如“将夹爪移至碗边”、“向上抬起”、“倾倒瓶子”)赋予VLA可操控性,实现自主技能获取。InSight包含两个主要阶段:(1)自动化分割流水线,通过VLM任务规划分解与末端执行器位姿,将示范数据划分为带标签的基元,从而支持VLA基元可操控性;(2)VLM引导的数据飞轮,识别完成新任务所需的缺失基元,自主尝试通过VLM提出的低级控制来执行缺失基元的示范,并自动标记、存储和整合成功示范到VLA训练集中。我们在仿真和真实机器人操作任务中评估了InSight,包括翻转方块、关闭抽屉、清扫、扭转和倾倒,且未使用任何目标技能的人类示范。一旦这些基元被习得,即可组合执行未见过的长时任务,无需额外人类示范。我们的结果表明,基元可操控性为VLA策略的持续技能获取提供了实用基础。项目网站:https://insight-vla.github.io。
English
Vision-language-action (VLA) models can learn manipulation skills from demonstrations, but their capabilities are bounded by the skills in the training data. We present InSight, a framework that unlocks autonomous skill acquisition by rendering VLAs steerable at the primitive-action level (e.g., "move gripper to the bowl", "lift upward", "pour the bottle"). InSight consists of two primary stages: (1) an automated segmentation pipeline that partitions demonstrations into labeled primitives via VLM plan decomposition and end-effector poses to enable VLA primitive steerability, and (2) a VLM-guided data flywheel that identifies missing primitives required to accomplish a novel task, autonomously attempts demonstrations of the missing primitives with VLM-proposed low-level control, and automatically labels, stores, and integrates successful demonstrations into the VLA training set. We evaluate InSight across simulation and real-world manipulation tasks, including block flipping, drawer closing, sweeping, twisting, and pouring, without any human demonstrations of these target skills. Once learned, these primitives can be composed to execute novel, long-horizon tasks without additional human demonstrations. Our findings demonstrate that primitive steerability provides a practical foundation for continual skill acquisition in VLA policies. Project website: https://insight-vla.github.io.