RoboPocket: スマートフォンでロボットポリシーを即座に改善

要旨

模倣学習の拡張性は、本質的にデータ収集の効率によって制約されている。ハンドヘルドインターフェースは実世界でのデータ収集における拡張可能な解決策として登場したが、これらは主にオープンループ方式で動作する：オペレータは基盤となるポリシーの弱点を知らずに盲目的に実演データを収集するため、重要な状態分布の効率的なカバレッジが達成されない。一方、DAggerのような対話型手法は共変量シフトに効果的に対処するが、物理的なロボット実行に依存するためコストが高く、拡張が困難である。このトレードオフを解決するため、我々は単一の民生用スマートフォンを用いた「ロボット不要の即時ポリシー反復」を可能とする携帯型システムRoboPocketを提案する。その中核となる革新は、拡張現実（AR）による軌道予測表示を通じてポリシーの予測軌道を可視化するリモート推論フレームワークである。この没入型フィードバックにより、収集者は物理的なロボットを必要とせず、潜在的な失敗を事前に特定し、ポリシーの弱点領域にデータ収集を集中できる。さらに、非同期のオンライン微調整パイプラインを実装し、流入するデータでポリシーを継続的に更新することで、数分以内に学習ループを閉じる。大規模実験により、RoboPocketがデータスケーリング則に従い、オフライン拡張戦略と比較してデータ効率を2倍向上させ、長年の効率ボトルネックを克服することを実証した。さらに、当社の即時反復ループは分散環境においても、一人当たり少数の対話的修正でサンプル効率を最大2倍向上させる。プロジェクトページと動画：https://robo-pocket.github.io。

English

Scaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing the underlying policy's weaknesses, leading to inefficient coverage of critical state distributions. Conversely, interactive methods like DAgger effectively address covariate shift but rely on physical robot execution, which is costly and difficult to scale. To reconcile this trade-off, we introduce RoboPocket, a portable system that enables Robot-Free Instant Policy Iteration using single consumer smartphones. Its core innovation is a Remote Inference framework that visualizes the policy's predicted trajectory via Augmented Reality (AR) Visual Foresight. This immersive feedback allows collectors to proactively identify potential failures and focus data collection on the policy's weak regions without requiring a physical robot. Furthermore, we implement an asynchronous Online Finetuning pipeline that continuously updates the policy with incoming data, effectively closing the learning loop in minutes. Extensive experiments demonstrate that RoboPocket adheres to data scaling laws and doubles the data efficiency compared to offline scaling strategies, overcoming their long-standing efficiency bottleneck. Moreover, our instant iteration loop also boosts sample efficiency by up to 2times in distributed environments a small number of interactive corrections per person. Project page and videos: https://robo-pocket.github.io.

RoboPocket: スマートフォンでロボットポリシーを即座に改善

RoboPocket: Improve Robot Policies Instantly with Your Phone

要旨

Support