葡萄:通过偏好对齐泛化机器人策略
GRAPE: Generalizing Robot Policy via Preference Alignment
November 28, 2024
作者: Zijian Zhang, Kaiyuan Zheng, Zhaorun Chen, Joel Jang, Yi Li, Chaoqi Wang, Mingyu Ding, Dieter Fox, Huaxiu Yao
cs.AI
摘要
尽管视觉-语言-动作(VLA)模型在各种机器人任务上取得了最新进展,但由于完全依赖于成功执行的行为克隆,它们存在一些关键问题,如泛化能力差,无法适应未见过的任务。此外,它们通常被微调以复制专家在不同环境下收集的演示,从而引入分布偏差,限制了其适应多样化操作目标(如效率、安全性和任务完成能力)。为了弥合这一差距,我们引入了GRAPE:通过偏好对齐来泛化机器人策略。具体而言,GRAPE在轨迹级别上对齐VLA,并隐式地对成功和失败试验中的奖励进行建模,以增强对多样化任务的泛化能力。此外,GRAPE将复杂的操作任务分解为独立阶段,并通过大型视觉-语言模型提出的关键点自动引导偏好建模的定制时空约束。值得注意的是,这些约束是灵活的,可以根据不同目标(如安全性、效率或任务成功)进行定制对齐模型。我们在真实环境和模拟环境中对GRAPE进行了各种任务的评估。实验结果表明,GRAPE提升了最先进的VLA模型的性能,在领域内和未见过的操作任务上,成功率分别提高了51.79%和60.36%。此外,GRAPE可以与各种目标对齐,如安全性和效率,将碰撞率降低了44.31%,将执行步长缩短了11.15%。所有代码、模型和数据均可在https://grape-vla.github.io/ 上获得。
English
Despite the recent advancements of vision-language-action (VLA) models on a
variety of robotics tasks, they suffer from critical issues such as poor
generalizability to unseen tasks, due to their reliance on behavior cloning
exclusively from successful rollouts. Furthermore, they are typically
fine-tuned to replicate demonstrations collected by experts under different
settings, thus introducing distribution bias and limiting their adaptability to
diverse manipulation objectives, such as efficiency, safety, and task
completion. To bridge this gap, we introduce GRAPE: Generalizing Robot Policy
via Preference Alignment. Specifically, GRAPE aligns VLAs on a trajectory level
and implicitly models reward from both successful and failure trials to boost
generalizability to diverse tasks. Moreover, GRAPE breaks down complex
manipulation tasks to independent stages and automatically guides preference
modeling through customized spatiotemporal constraints with keypoints proposed
by a large vision-language model. Notably, these constraints are flexible and
can be customized to align the model with varying objectives, such as safety,
efficiency, or task success. We evaluate GRAPE across a diverse array of tasks
in both real-world and simulated environments. Experimental results demonstrate
that GRAPE enhances the performance of state-of-the-art VLA models, increasing
success rates on in-domain and unseen manipulation tasks by 51.79% and 60.36%,
respectively. Additionally, GRAPE can be aligned with various objectives, such
as safety and efficiency, reducing collision rates by 44.31% and rollout
step-length by 11.15%, respectively. All code, models, and data are available
at https://grape-vla.github.io/Summary
AI-Generated Summary