AimBot:一種簡易的視覺輔助提示,用於增強視覺運動策略的空間感知能力
AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies
August 11, 2025
作者: Yinpei Dai, Jayjun Lee, Yichi Zhang, Ziqiao Ma, Jed Yang, Amir Zadeh, Chuan Li, Nima Fazeli, Joyce Chai
cs.AI
摘要
本文提出了一種名為AimBot的輕量級視覺增強技術,該技術通過提供明確的空間線索來改善機器人操作中的視覺運動策略學習。AimBot在多重視角RGB圖像上疊加射擊線和瞄準鏡十字線,提供輔助視覺引導,這些引導編碼了末端執行器的狀態。這些疊加圖像由深度圖像、相機外參以及當前末端執行器姿態計算得出,明確傳達了夾爪與場景中物體之間的空間關係。AimBot僅帶來極小的計算開銷(少於1毫秒),且無需改變模型架構,因為它僅需將原始RGB圖像替換為增強後的版本。儘管方法簡單,我們的結果顯示,AimBot在模擬和現實環境中均能持續提升多種視覺運動策略的性能,凸顯了基於空間的視覺反饋的優勢。
English
In this paper, we propose AimBot, a lightweight visual augmentation technique
that provides explicit spatial cues to improve visuomotor policy learning in
robotic manipulation. AimBot overlays shooting lines and scope reticles onto
multi-view RGB images, offering auxiliary visual guidance that encodes the
end-effector's state. The overlays are computed from depth images, camera
extrinsics, and the current end-effector pose, explicitly conveying spatial
relationships between the gripper and objects in the scene. AimBot incurs
minimal computational overhead (less than 1 ms) and requires no changes to
model architectures, as it simply replaces original RGB images with augmented
counterparts. Despite its simplicity, our results show that AimBot consistently
improves the performance of various visuomotor policies in both simulation and
real-world settings, highlighting the benefits of spatially grounded visual
feedback.