AimBot: 시각운동 정책의 공간 인식을 향상시키기 위한 간단한 보조 시각적 큐

초록

본 논문에서는 로봇 매니퓰레이션에서 시각운동 정책 학습을 개선하기 위한 명시적 공간 단서를 제공하는 경량화된 시각 보강 기술인 AimBot를 제안합니다. AimBot는 다중 시점 RGB 이미지에 사격선과 조준망을 오버레이하여 엔드 이펙터의 상태를 인코딩하는 보조 시각적 지침을 제공합니다. 이 오버레이는 깊이 이미지, 카메라 외부 파라미터, 그리고 현재 엔드 이펙터 자세로부터 계산되며, 그리퍼와 장면 내 객체 간의 공간적 관계를 명확하게 전달합니다. AimBot는 최소한의 계산 오버헤드(1ms 미만)만 발생시키며, 모델 아키텍처 변경 없이 원본 RGB 이미지를 보강된 이미지로 대체하기만 하면 됩니다. 단순함에도 불구하고, 우리의 실험 결과는 AimBot가 시뮬레이션과 실제 환경에서 다양한 시각운동 정책의 성능을 지속적으로 향상시킴을 보여주며, 공간적으로 기반을 둔 시각적 피드백의 이점을 강조합니다.

English

In this paper, we propose AimBot, a lightweight visual augmentation technique that provides explicit spatial cues to improve visuomotor policy learning in robotic manipulation. AimBot overlays shooting lines and scope reticles onto multi-view RGB images, offering auxiliary visual guidance that encodes the end-effector's state. The overlays are computed from depth images, camera extrinsics, and the current end-effector pose, explicitly conveying spatial relationships between the gripper and objects in the scene. AimBot incurs minimal computational overhead (less than 1 ms) and requires no changes to model architectures, as it simply replaces original RGB images with augmented counterparts. Despite its simplicity, our results show that AimBot consistently improves the performance of various visuomotor policies in both simulation and real-world settings, highlighting the benefits of spatially grounded visual feedback.

AimBot: 시각운동 정책의 공간 인식을 향상시키기 위한 간단한 보조 시각적 큐

AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies

초록

Support