AimBot: 視覚運動ポリシーの空間認識を向上させるためのシンプルな補助的視覚的合図

要旨

本論文では、ロボット操作における視覚運動ポリシー学習を改善するための明示的な空間的手がかりを提供する軽量な視覚的拡張技術「AimBot」を提案する。AimBotは、マルチビューRGB画像に射撃線とスコープレティクルをオーバーレイし、エンドエフェクタの状態をエンコードする補助的な視覚的ガイダンスを提供する。これらのオーバーレイは、深度画像、カメラの外部パラメータ、および現在のエンドエフェクタの姿勢から計算され、グリッパーとシーン内のオブジェクト間の空間的関係を明示的に伝える。AimBotは、最小限の計算オーバーヘッド（1ミリ秒未満）しか発生せず、モデルアーキテクチャの変更を必要としない。元のRGB画像を拡張された画像に置き換えるだけで実現できる。その簡潔さにもかかわらず、我々の結果は、AimBotがシミュレーションと実世界の両方の設定において、様々な視覚運動ポリシーの性能を一貫して向上させることを示しており、空間的に根ざした視覚的フィードバックの利点を強調している。

English

In this paper, we propose AimBot, a lightweight visual augmentation technique that provides explicit spatial cues to improve visuomotor policy learning in robotic manipulation. AimBot overlays shooting lines and scope reticles onto multi-view RGB images, offering auxiliary visual guidance that encodes the end-effector's state. The overlays are computed from depth images, camera extrinsics, and the current end-effector pose, explicitly conveying spatial relationships between the gripper and objects in the scene. AimBot incurs minimal computational overhead (less than 1 ms) and requires no changes to model architectures, as it simply replaces original RGB images with augmented counterparts. Despite its simplicity, our results show that AimBot consistently improves the performance of various visuomotor policies in both simulation and real-world settings, highlighting the benefits of spatially grounded visual feedback.

AimBot: 視覚運動ポリシーの空間認識を向上させるためのシンプルな補助的視覚的合図

AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies

要旨

Support