시각운동 정책에서 고유수용성 상태가 필요한가?

초록

모방 학습 기반 시각운동 정책은 로봇 매니퓰레이션에서 널리 사용되어 왔으며, 정밀한 제어를 위해 일반적으로 시각 관측과 고유수용성 상태를 함께 사용한다. 그러나 본 연구에서는 이러한 일반적인 관행이 정책을 고유수용성 상태 입력에 지나치게 의존하게 만들어 훈련 궤적에 과적합을 일으키고 공간 일반화를 저하시키는 것을 발견했다. 이에 반해, 우리는 고유수용성 상태 입력을 제거하고 시각 관측만을 조건으로 동작을 예측하는 State-free Policy를 제안한다. State-free Policy는 상대적 엔드 이펙터 동작 공간에서 구축되며, 이중 광각 손목 카메라로 제공되는 작업 관련 시각 관측을 완전히 보장해야 한다. 실험 결과는 State-free 정책이 상태 기반 정책보다 훨씬 강력한 공간 일반화를 달성함을 보여준다: 피크 앤 플레이스, 도전적인 셔츠 접기, 복잡한 전신 매니퓰레이션과 같은 실제 작업에서, 높이 일반화의 평균 성공률은 0%에서 85%로, 수평 일반화는 6%에서 64%로 향상되었다. 또한, 데이터 효율성과 교차 구현 적응에서도 이점을 보여 실제 배포를 위한 실용성을 강화했다.

English

Imitation-learning-based visuomotor policies have been widely used in robot manipulation, where both visual observations and proprioceptive states are typically adopted together for precise control. However, in this study, we find that this common practice makes the policy overly reliant on the proprioceptive state input, which causes overfitting to the training trajectories and results in poor spatial generalization. On the contrary, we propose the State-free Policy, removing the proprioceptive state input and predicting actions only conditioned on visual observations. The State-free Policy is built in the relative end-effector action space, and should ensure the full task-relevant visual observations, here provided by dual wide-angle wrist cameras. Empirical results demonstrate that the State-free policy achieves significantly stronger spatial generalization than the state-based policy: in real-world tasks such as pick-and-place, challenging shirt-folding, and complex whole-body manipulation, spanning multiple robot embodiments, the average success rate improves from 0\% to 85\% in height generalization and from 6\% to 64\% in horizontal generalization. Furthermore, they also show advantages in data efficiency and cross-embodiment adaptation, enhancing their practicality for real-world deployment.

시각운동 정책에서 고유수용성 상태가 필요한가?

Do You Need Proprioceptive States in Visuomotor Policies?

초록

Support