ENPIRE: 실제 세계에서의 에이전트적 로봇 정책 자기 개선

초록

현실 세계에서 정교한 로봇 조작을 달성하는 것은 인간의 감독과 알고리즘 엔지니어링에 크게 의존하며, 이는 일반 물리적 지능을 추구하는 데 핵심적인 병목이 된다. 최신 코딩 에이전트는 알고리즘 탐색을 자동화하는 코드를 생성할 수 있지만, 그 성공은 대부분 디지털 환경에 국한되어 있다. 우리는 로봇 연구를 자동화하는 데 부재하는 추상화가 현실 세계 정책 개선을 위한 반복 가능한 피드백 루프, 즉 장면 재설정, 정책 실행, 결과 검증, 다음 반복 개선이라고 추측한다. 이 격차를 해소하기 위해, 우리는 ENPIRE를 소개한다. 이는 코딩 에이전트를 위한 하네스 프레임워크로, 네 가지 핵심 모듈을 통해 이 물리적 피드백 루틴을 구현한다: 자동 재설정 및 검증을 위한 환경 모듈 (EN), 정책 개선을 시작하는 정책 개선 모듈 (PI), 병렬로 작동하는 하나 또는 여러 물리적 로봇으로 정책을 평가하는 롤아웃 모듈 (R), 그리고 코딩 에이전트가 로그를 분석하고 문헌을 참고하며 훈련 인프라와 알고리즘 코드를 개선하여 실패 모드를 해결하는 진화 모듈 (E). 이 폐쇄 루프 시스템은 현실 세계 조작 학습을 통제 가능한 최적화 절차로 변환하여, 인간의 노력을 최소화하면서 훈련 레시피와 에이전트 변형 간의 공정한 절제 실험을 가능하게 한다. ENPIRE의 힘으로, 최첨단 코딩 에이전트는 핀 상자 정리, 케이블 타이 조이기, 도구 사용과 같은 까다롭고 정교한 조작 작업에서 99%의 성공률을 달성하는 정책을 자율적으로 훈련할 수 있으며, 이 과정은 로봇 함대에 에이전트 팀을 배치할 때 더욱 가속화된다. 우리의 결과는 물리적 세계에서 로봇 기술을 자율적으로 발전시키기 위해 코딩 에이전트를 배포하는 실용적이고 확장 가능한 경로를 제시한다.

English

Achieving dexterous robotic manipulation in the real world heavily relies on human supervision and algorithm engineering, which becomes a central bottleneck in the pursuit of general physical intelligence. Although emerging coding agents can generate code to automate algorithm search, their successes remain largely confined in digital environments. We conjecture that the missing abstraction to automate robotics research is a repeatable feedback loop for real-world policy improvement: reset the scene, execute a policy, verify the outcome, and refine the next iteration. To bridge this gap, we introduce ENPIRE, a harness framework for coding agents that instantiates this physical feedback routine with four core modules: an Environment module (EN) for automatic reset and verification, a Policy Improvement module (PI) that launches policy refinement, a Rollout module (R) to evaluate policies with one or multiple physical robots operating in parallel, and an Evolution module (E) in which coding agents analyze logs, consult literature, improve training infrastructure and algorithm code to address failure modes. This closed-loop system transforms real-world manipulation learning into a controllable optimization procedure, minimizing human effort while allowing fair ablations across training recipe and agent variants. Powered by ENPIRE, frontier coding agents can autonomously train a policy to achieve a 99% success rate on challenging, dexterous manipulation tasks, such as organizing a pin box, fastening a zip tie, and tool use, a process that further accelerates when we dispatch an agent team on a robot fleet. Our results suggest a practical and scalable path toward deploying coding agents to autonomously advancing robotics in the physical world.