ENPIRE: 実世界におけるエージェント型ロボットポリシーの自己改善

要旨

実世界での器用なロボット操作を達成するには、人間の監視とアルゴリズムエンジニアリングに大きく依存しており、これが一般的な物理的知能を追求する上での中心的なボトルネックとなっている。新興のコーディングエージェントはコードを生成してアルゴリズム探索を自動化できるが、その成功は主にデジタル環境に限定されている。我々は、ロボティクス研究を自動化するために欠けている抽象化は、実世界でのポリシー改善のための反復可能なフィードバックループ（シーンをリセットし、ポリシーを実行し、結果を検証し、次の反復を改善する）であると推測する。このギャップを埋めるために、我々はENPIREを導入する。これはコーディングエージェントのためのハーネスフレームワークであり、この物理的フィードバックルーチンを4つのコアモジュールで具体化する。すなわち、自動リセットと検証のための環境モジュール(EN)、ポリシー改良を開始するポリシー改善モジュール(PI)、1台または複数の物理ロボットを並行して動作させてポリシーを評価するロールアウトモジュール(R)、そしてコーディングエージェントがログを分析し、文献を参照し、トレーニングインフラとアルゴリズムコードを改善して障害モードに対処する進化モジュール(E)である。この閉ループシステムは、実世界の操作学習を制御可能な最適化手順に変換し、人間の労力を最小限に抑えつつ、トレーニングレシピとエージェントバリアント間での公平なアブレーションを可能にする。ENPIREを活用することで、最先端のコーディングエージェントは、ピンボックスの整理、ジップタイの締め付け、道具の使用といった困難で器用な操作タスクにおいて、自律的にポリシーを訓練し99%の成功率を達成できる。このプロセスは、ロボットフリートにエージェントチームを派遣することでさらに加速する。我々の結果は、物理世界でロボティクスを自律的に進歩させるためにコーディングエージェントを展開する、実用的かつスケーラブルな道筋を示唆している。

English

Achieving dexterous robotic manipulation in the real world heavily relies on human supervision and algorithm engineering, which becomes a central bottleneck in the pursuit of general physical intelligence. Although emerging coding agents can generate code to automate algorithm search, their successes remain largely confined in digital environments. We conjecture that the missing abstraction to automate robotics research is a repeatable feedback loop for real-world policy improvement: reset the scene, execute a policy, verify the outcome, and refine the next iteration. To bridge this gap, we introduce ENPIRE, a harness framework for coding agents that instantiates this physical feedback routine with four core modules: an Environment module (EN) for automatic reset and verification, a Policy Improvement module (PI) that launches policy refinement, a Rollout module (R) to evaluate policies with one or multiple physical robots operating in parallel, and an Evolution module (E) in which coding agents analyze logs, consult literature, improve training infrastructure and algorithm code to address failure modes. This closed-loop system transforms real-world manipulation learning into a controllable optimization procedure, minimizing human effort while allowing fair ablations across training recipe and agent variants. Powered by ENPIRE, frontier coding agents can autonomously train a policy to achieve a 99% success rate on challenging, dexterous manipulation tasks, such as organizing a pin box, fastening a zip tie, and tool use, a process that further accelerates when we dispatch an agent team on a robot fleet. Our results suggest a practical and scalable path toward deploying coding agents to autonomously advancing robotics in the physical world.