基於Affordance的機器人操作與流匹配

摘要

我們提出了一個輔助機器人操作的框架，專注於兩個基本挑戰：首先，有效地將大規模模型適應到下游場景可負擔性理解任務，特別是在日常生活場景中，收集涉及人類的多任務數據需要費力；其次，通過基於視覺可負擔性模型的機器人軌跡學習。我們通過採用一種參數高效的提示調整方法來應對第一個挑戰，該方法在凍結的視覺模型前面添加可學習的文本提示，以在多任務場景中預測操作可負擔性。然後，我們提出通過受可負擔性引導的監督式流匹配方法來學習機器人軌跡。流匹配將機器人視覺運動策略表示為將隨機航點流向所需機器人軌跡的條件過程。最後，我們引入了一個涵蓋日常生活活動的10個任務的真實世界數據集來測試我們的框架。我們的廣泛評估突出了所提出的用於學習操作可負擔性的提示調整方法與語言提示器實現了競爭性性能，甚至在不同數據規模上優於其他微調協議，同時滿足參數效率。通過單一流匹配策略學習多任務機器人軌跡也比替代行為克隆方法始終實現更好的性能，特別是在考慮多模態機器人行動分佈的情況下。我們的框架通過流匹配無縫統一了可負擔性模型學習和軌跡生成，用於機器人操作。

English

We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understanding tasks, especially in daily living scenarios where gathering multi-task data involving humans requires strenuous effort; second, effectively learning robot trajectories by grounding the visual affordance model. We tackle the first challenge by employing a parameter-efficient prompt tuning method that prepends learnable text prompts to the frozen vision model to predict manipulation affordances in multi-task scenarios. Then we propose to learn robot trajectories guided by affordances in a supervised Flow Matching method. Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot trajectories. Finally, we introduce a real-world dataset with 10 tasks across Activities of Daily Living to test our framework. Our extensive evaluation highlights that the proposed prompt tuning method for learning manipulation affordance with language prompter achieves competitive performance and even outperforms other finetuning protocols across data scales, while satisfying parameter efficiency. Learning multi-task robot trajectories with a single flow matching policy also leads to consistently better performance than alternative behavior cloning methods, especially given multimodal robot action distributions. Our framework seamlessly unifies affordance model learning and trajectory generation with flow matching for robot manipulation.

基於Affordance的機器人操作與流匹配

Affordance-based Robot Manipulation with Flow Matching

摘要

Support