누락된 모드 상황에서 강건한 다중 모드 모방 학습을 위한 강화 학습 유도 검색과 소프트 퓨전

초록

로봇 시스템은 시각 카메라 스트림과 자연어 명령을 포함한 다중 입력 모달리티를 통해 세상을 인식하며, 이러한 신호를 기반으로 적절한 행동을 선택해야 한다. 그러나 모든 입력 장치의 영구적인 가용성을 가정하는 것은 비현실적인데, 이는 배치 중 센서가 고장 나거나 가려지거나 완전히 드롭아웃될 수 있기 때문이다. 따라서 실제 로봇 운영을 위해서는 이러한 모달리티 누락 시나리오에 대한 강건한 처리가 필수적이다. 본 논문은 RL4IL을 소개하는데, 이는 강화 학습으로 안내되는 모방 학습 방법으로, 훈련 라이브러리에서 가장 관련성 높은 전문가 시연을 식별하여 주어진 관측에 가장 적합한 행동을 선택한다. 너비 우선 탐색 후보 집합에 대해 근접 정책 최적화를 통해 훈련된 강화 학습 정책은 후보 시연의 순위를 매기고, 소프트 교차 주의 융합 헤드가 이들의 행동 신호를 집계하여 최종 예측을 생성한다. 추론 시 모달리티가 누락된 경우, 전용 모달리티별 RL 검색 정책이 훈련 라이브러리에서 기증자 시연을 식별하고, 소프트 대체 헤드가 상위 순위 기증자에 대한 교차 주의를 통해 누락된 임베딩을 재구성한다. 이 과정에서 시스템의 재훈련은 필요하지 않다. 세 가지 LIBERO 벤치마크 스위트에 대한 실험은 RL4IL이 정책 네트워크 훈련이 필요하지 않으면서 센서 드롭아웃 조건에서 최신 모방 학습 방법을 상당히 능가함을 입증한다. 코드는 https://github.com/h-ismkhan/Reinforcement-Learning-via-kNN-for-Robotic-Learning-with-Missing-Camera 에서 확인할 수 있다.

English

Robotic systems perceive the world through multiple input modalities -- including visual camera streams and natural language instructions -- and must select appropriate actions based on these signals. However, assuming the permanent availability of all input devices is unrealistic, as sensors may fail, become occluded, or drop out entirely during deployment. Robust handling of such missing-modality scenarios is therefore essential for real-world robot operation. This paper introduces RL4IL, a reinforcement learning guided method for imitation learning that selects the most suitable action for a given observation by identifying the most relevant expert demonstrations from a training library. A reinforcement learning policy, trained via Proximal Policy Optimisation over Breadth-First Search candidate sets, ranks candidate demonstrations and a soft cross-attention fusion head aggregates their action signals to produce the final prediction. When a modality is missing at inference time, a dedicated per-modality RL retrieval policy identifies donor demonstrations from the training library, and a soft imputation head reconstructs the missing embedding via cross-attention over the top-ranked donors -- without requiring any retraining of the system. Experiments on three LIBERO benchmark suites demonstrate that RL4IL substantially outperforms state-of-the-art imitation learning methods under sensor dropout conditions, while requiring no policy network training. The code can be found at https://github.com/h-ismkhan/Reinforcement-Learning-via-kNN-for-Robotic-Learning-with-Missing-Camera