EurekAgent: 自律的科学発見に必要なのはエージェント環境工学だけ

要旨

LLMベースのエージェントは、科学的発見の自動化においてますますその可能性を示している。最適化可能な指標と実行環境が与えられれば、科学的解決策を提案、検証、反復することができ、人間が設計した手法を上回る結果を生み出している。モデルの能力が向上し続けるにつれて、自律的な科学的発見におけるボトルネックは、エージェントワークフローを規定することから、エージェント環境（エージェントの動作を形作るリソース、制約、インターフェース）を設計することへと移行していると我々は主張する。我々はこれを環境エンジニアリングと位置付ける。すなわち、開放的な探索、体系的なアーティファクト管理、エージェント間の協力といった生産的な行動を増幅し、報酬ハッキングや高摩擦な人間による監視といった有害な行動を抑制する環境を構築することである。我々は、指標駆動型の自律的な科学的発見のための環境エンジニアリングされたエージェントシステムであるEurekAgentを提案する。EurekAgentは、環境を4つの次元でエンジニアリングする。すなわち、制限付きエージェント実行と隔離された評価のための権限エンジニアリング、ファイルシステムとGitベースの協力のためのアーティファクトエンジニアリング、予算を考慮した探索のための予算エンジニアリング、そして容易な人間による監視と介入のためのヒューマンインザループエンジニアリングである。EurekAgentは、複数の数学、カーネルエンジニアリング、機械学習タスクにおいて新たな最先端結果を達成しており、その中には総APIコスト11ドル未満で発見された新たな最先端の26個の円の充填結果も含まれる。我々はコードと結果をオープンソース化し、信頼性の高い自律的研究エージェントを開発するための中核的な研究方向として環境エンジニアリングを提唱する。

English

LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities continue to improve, we argue that the bottleneck for autonomous scientific discovery is shifting from prescribing agent workflows to designing agent environments: the resources, constraints, and interfaces that shape agent behavior. We frame this as environment engineering: building environments that amplify productive behaviors, such as open-ended exploration, systematic artifact management, and inter-agent collaboration, while suppressing harmful behaviors, such as reward hacking and high-friction human oversight. We present EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery. EurekAgent engineers the environment along four dimensions: permissions engineering for bounded agent execution and isolated evaluation; artifact engineering for filesystem and Git-based collaboration; budget engineering for budget-aware exploration; and human-in-the-loop engineering for easy human supervision and intervention. EurekAgent sets new state-of-the-art results on multiple mathematics, kernel engineering, and machine learning tasks, including new state-of-the-art 26-circle packing results discovered with less than $11 in total API cost. We open-source our code and results, and call for environment engineering as a core research direction for developing reliable autonomous research agents.