ChatPaper.aiChatPaper

基于功能性的机器人操作与流匹配

Affordance-based Robot Manipulation with Flow Matching

September 2, 2024
作者: Fan Zhang, Michael Gienger
cs.AI

摘要

我们提出了一个辅助机器人操作的框架,重点解决了两个基本挑战:首先,有效地将大规模模型调整到下游场景可供性理解任务中,特别是在日常生活场景中,收集涉及人类的多任务数据需要大量努力;其次,通过基于视觉可供性模型的机器人轨迹学习来有效地解决机器人轨迹学习问题。我们通过采用参数高效的提示调整方法来应对第一个挑战,该方法在冻结视觉模型之前添加可学习的文本提示,以预测多任务场景中的操作可供性。然后,我们提出通过受可供性指导的机器人轨迹学习来实现轨迹学习,采用监督的流匹配方法。流匹配将机器人视觉运动策略表示为将随机航点流向期望的机器人轨迹的条件过程。最后,我们引入了一个涵盖日常生活活动中的10个任务的真实世界数据集来测试我们的框架。我们的广泛评估突出了所提出的用于学习操作可供性的提示调整方法与语言提示器实现了竞争性能,并且在满足参数效率的同时,甚至优于其他微调协议跨数据规模,学习多任务机器人轨迹的单一流匹配策略也比其他行为克隆方法始终表现更好,尤其是在考虑到多模态机器人动作分布的情况下。我们的框架通过流匹配无缝统一了可供性模型学习和机器人操作的轨迹生成。
English
We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understanding tasks, especially in daily living scenarios where gathering multi-task data involving humans requires strenuous effort; second, effectively learning robot trajectories by grounding the visual affordance model. We tackle the first challenge by employing a parameter-efficient prompt tuning method that prepends learnable text prompts to the frozen vision model to predict manipulation affordances in multi-task scenarios. Then we propose to learn robot trajectories guided by affordances in a supervised Flow Matching method. Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot trajectories. Finally, we introduce a real-world dataset with 10 tasks across Activities of Daily Living to test our framework. Our extensive evaluation highlights that the proposed prompt tuning method for learning manipulation affordance with language prompter achieves competitive performance and even outperforms other finetuning protocols across data scales, while satisfying parameter efficiency. Learning multi-task robot trajectories with a single flow matching policy also leads to consistently better performance than alternative behavior cloning methods, especially given multimodal robot action distributions. Our framework seamlessly unifies affordance model learning and trajectory generation with flow matching for robot manipulation.

Summary

AI-Generated Summary

PDF192November 16, 2024