포어사이트: 행동 조건화 세계 모델 잠재 변수를 활용한 장기 지평 로봇 조작에서의 실패 감지

초록

장기 지평 작업은 실제 로봇 배치에서 흔히 발생하지만, 이러한 작업에 대한 실패 감지는 여전히 충분히 연구되지 않았다. 장기 지평 로봇 작업에서의 실패 감지는 실패 시작점이 종종 모호하고, 시간적 밀집 주석이 일반적으로 제공되지 않기 때문에 특히 어렵다. 본 논문에서는 행동 조건부 세계 모델의 잠재 표현을 사용하여 조작 궤적을 모니터링하는 실패 감지 프레임워크인 Foresight를 제시한다. Foresight는 최종 작업 수준의 성공 또는 실패 레이블만을 사용하여 훈련된다. 예측적 세계 모델 임베딩을 활용함으로써, 본 방법은 다양한 정책에 걸친 실패 감지를 위한 통합 프레임워크를 제공한다. 또한 함수적 등각 예측(FCP)을 사용하여 감지 임계값을 적응적으로 보정한다. 우리는 Foresight를 LIBERO-Long, ManiSkill-Long, BEHAVIOR-1K 시뮬레이션에서 최첨단 시각-언어-행동 정책과 함께 평가하고, 최신 실패 감지 방법과 비교하며, ReactorX-200 팔을 사용한 세 가지 장기 지평 작업과 Franka 팔을 사용한 한 가지 작업으로 실제 로봇에서 검증한다. 결과는 행동 조건부 세계 모델 임베딩이 장기 지평 조작에서 신뢰할 수 있는 실패 모니터링을 위한 확장 가능한 표현을 제공함을 시사한다.

English

Long-horizon tasks are common in real-world robotic deployments, yet failure detection for such tasks remains underexplored. Detecting failures in long-horizon robotic tasks is particularly challenging because failure onset is often ambiguous and dense temporal annotations are typically unavailable. We present Foresight, a failure detection framework that monitors manipulation trajectories using latent representations from an action-conditioned world model. Foresight is trained using only final task-level success or failure labels. By leveraging predictive world-model embeddings, our method provides a unified framework for failure detection across different policies. We further use functional conformal prediction (FCP) to calibrate detection thresholds adaptively. We evaluate Foresight with state-of-the-art vision-language-action policies in simulation on LIBERO-Long, ManiSkill-Long, and BEHAVIOR-1K, compare it against state-of-the-artfailure detection methods, and validate it on real robots with three long-horizon tasks on a ReactorX-200 arm and one task on a Franka arm. Our results suggest that action-conditioned world-model embeddings provide a scalable representation for reliable failure monitoring in long-horizon manipulation.