EurekAgent: 자율 과학 발견을 위해 필요한 것은 에이전트 환경 엔지니어링뿐이다

초록

LLM 기반 에이전트는 과학적 발견 자동화에서 증가하는 잠재력을 보여주고 있다. 최적화 가능한 지표와 실행 환경이 주어지면, 이들은 과학적 해결책을 제안, 검증 및 반복할 수 있으며, 인간이 설계한 접근 방식을 능가하는 결과를 생성해 왔다. 모델 성능이 지속적으로 향상됨에 따라, 우리는 자율적 과학 발견의 병목 현상이 에이전트 워크플로우 처방에서 에이전트 환경 설계, 즉 에이전트 행동을 형성하는 자원, 제약 조건 및 인터페이스로 이동하고 있다고 주장한다. 우리는 이를 환경 공학(environment engineering)이라고 정의한다: 개방형 탐색, 체계적 아티팩트 관리, 에이전트 간 협업과 같은 생산적 행동을 증폭시키고, 보상 해킹 및 마찰이 큰 인간 감독과 같은 유해한 행동을 억제하는 환경을 구축하는 것이다. 우리는 지표 기반 자율 과학 발견을 위한 환경 공학 기반 에이전트 시스템인 EurekAgent를 제시한다. EurekAgent는 네 가지 차원에서 환경을 공학적으로 설계한다: 제한된 에이전트 실행과 격리된 평가를 위한 권한 공학; 파일시스템 및 Git 기반 협업을 위한 아티팩트 공학; 예산 인식 탐색을 위한 예산 공학; 쉬운 인간 감독 및 개입을 위한 인간-루프 공학. EurekAgent는 여러 수학, 커널 공학 및 머신러닝 과제에서 새로운 최고 수준의 결과를 달성했으며, 총 API 비용 11달러 미만으로 26개의 원 패킹 결과에서 새로운 최고 수준을 발견했다. 우리는 코드와 결과를 오픈소스로 공개하며, 신뢰할 수 있는 자율 연구 에이전트 개발을 위한 핵심 연구 방향으로 환경 공학을 제안한다.

English

LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities continue to improve, we argue that the bottleneck for autonomous scientific discovery is shifting from prescribing agent workflows to designing agent environments: the resources, constraints, and interfaces that shape agent behavior. We frame this as environment engineering: building environments that amplify productive behaviors, such as open-ended exploration, systematic artifact management, and inter-agent collaboration, while suppressing harmful behaviors, such as reward hacking and high-friction human oversight. We present EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery. EurekAgent engineers the environment along four dimensions: permissions engineering for bounded agent execution and isolated evaluation; artifact engineering for filesystem and Git-based collaboration; budget engineering for budget-aware exploration; and human-in-the-loop engineering for easy human supervision and intervention. EurekAgent sets new state-of-the-art results on multiple mathematics, kernel engineering, and machine learning tasks, including new state-of-the-art 26-circle packing results discovered with less than $11 in total API cost. We open-source our code and results, and call for environment engineering as a core research direction for developing reliable autonomous research agents.