흐름 일치를 이용한 행동 기반 로봇 조작

초록

저희는 도움을 주는 로봇 조작을 위한 프레임워크를 제시합니다. 이 프레임워크는 두 가지 주요 도전 과제에 초점을 맞춥니다. 첫째, 특히 인간을 포함한 다중 작업 데이터 수집이 고통스러운 노력을 필요로 하는 일상 생활 시나리오에서 대규모 모델을 효율적으로 조정하여 하류 장면 가용성 이해 작업에 적응하는 것입니다. 둘째, 시각적 가용성 모델을 기반으로 로봇 궤적을 효과적으로 학습하는 것입니다. 첫 번째 도전 과제는 학습 가능한 텍스트 프롬프트를 얼린 비전 모델에 앞부분에 추가하여 다중 작업 시나리오에서 조작 가능성을 예측하기 위한 매개 변수 효율적 프롬프트 튜닝 방법을 사용하여 해결합니다. 그런 다음 우리는 가용성에 따라 안내되는 로봇 궤적을 학습하기 위해 지도된 Flow Matching 방법을 제안합니다. Flow matching은 로봇 시각운동 정책을 무작위 웨이포인트를 흐르게 하여 원하는 로봇 궤적으로 조건부 프로세스로 나타냅니다. 마지막으로, 우리는 일상 생활 활동을 통해 10가지 작업을 수행하는 실제 데이터셋을 소개하여 우리의 프레임워크를 테스트합니다. 우리의 포문트 튜닝 방법은 언어 프롬프터를 사용하여 조작 가능성을 학습하는 데 경쟁력 있는 성능을 달성하고 다른 파인튜닝 프로토콜을 데이터 규모 전반에 걸쳐 능가하면서 매개 변수 효율성을 충족시킵니다. 단일 플로우 매칭 정책으로 다중 작업 로봇 궤적을 학습하는 것은 다른 행동 복제 방법보다 일관되게 더 나은 성능을 보여줍니다, 특히 다중 모달 로봇 액션 분포가 제공될 때. 우리의 프레임워크는 로봇 조작을 위한 가용성 모델 학습과 궤적 생성을 흐름 일치로 통합합니다.

English

We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understanding tasks, especially in daily living scenarios where gathering multi-task data involving humans requires strenuous effort; second, effectively learning robot trajectories by grounding the visual affordance model. We tackle the first challenge by employing a parameter-efficient prompt tuning method that prepends learnable text prompts to the frozen vision model to predict manipulation affordances in multi-task scenarios. Then we propose to learn robot trajectories guided by affordances in a supervised Flow Matching method. Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot trajectories. Finally, we introduce a real-world dataset with 10 tasks across Activities of Daily Living to test our framework. Our extensive evaluation highlights that the proposed prompt tuning method for learning manipulation affordance with language prompter achieves competitive performance and even outperforms other finetuning protocols across data scales, while satisfying parameter efficiency. Learning multi-task robot trajectories with a single flow matching policy also leads to consistently better performance than alternative behavior cloning methods, especially given multimodal robot action distributions. Our framework seamlessly unifies affordance model learning and trajectory generation with flow matching for robot manipulation.

흐름 일치를 이용한 행동 기반 로봇 조작

Affordance-based Robot Manipulation with Flow Matching

초록

Support