ChatPaper.aiChatPaper

EurekAgent:智能体环境工程是自主科学发现的全部所需

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

June 11, 2026
作者: Amy Xin, Jiening Siow, Junjie Wang, Zijun Yao, Fanjin Zhang, Jian Song, Lei Hou, Juanzi Li
cs.AI

摘要

基于大语言模型的智能体在自动化科学发现方面展现出日益增长的潜力。给定一个可优化的指标和执行环境,它们能够提出、验证并迭代科学解决方案,且已产出优于人类设计方法的结果。随着模型能力的持续提升,我们认为,自主科学发现的瓶颈正从规定智能体工作流程转向设计智能体环境:即塑造智能体行为的资源、约束与接口。我们将此定义为环境工程:构建能够放大有益行为(如开放式探索、系统性成果管理、智能体间协作)并抑制有害行为(如奖励黑客行为、高摩擦人工监督)的环境。本文提出EurekAgent,一个面向指标驱动型自主科学发现的环境工程化智能体系统。EurekAgent从四个维度进行环境工程:权限工程(实现受限智能体执行与隔离评估);成果工程(实现基于文件系统与Git的协作);预算工程(实现预算感知的探索);以及人在回路工程(实现便捷的人类监督与干预)。EurekAgent在多个数学、内核工程及机器学习任务上取得了新的最优结果,包括以不到11美元的总API成本发现新的26圆填充最优解。我们开源了代码与结果,并呼吁将环境工程作为开发可靠自主研究智能体的核心研究方向。
English
LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities continue to improve, we argue that the bottleneck for autonomous scientific discovery is shifting from prescribing agent workflows to designing agent environments: the resources, constraints, and interfaces that shape agent behavior. We frame this as environment engineering: building environments that amplify productive behaviors, such as open-ended exploration, systematic artifact management, and inter-agent collaboration, while suppressing harmful behaviors, such as reward hacking and high-friction human oversight. We present EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery. EurekAgent engineers the environment along four dimensions: permissions engineering for bounded agent execution and isolated evaluation; artifact engineering for filesystem and Git-based collaboration; budget engineering for budget-aware exploration; and human-in-the-loop engineering for easy human supervision and intervention. EurekAgent sets new state-of-the-art results on multiple mathematics, kernel engineering, and machine learning tasks, including new state-of-the-art 26-circle packing results discovered with less than $11 in total API cost. We open-source our code and results, and call for environment engineering as a core research direction for developing reliable autonomous research agents.