SR-科学家:基于智能代理AI的科学方程发现
SR-Scientist: Scientific Equation Discovery With Agentic AI
October 13, 2025
作者: Shijie Xia, Yuhan Sun, Pengfei Liu
cs.AI
摘要
近期,大型语言模型(LLMs)被应用于科学方程发现领域,利用其内嵌的科学知识进行假设生成。然而,现有方法通常将LLMs局限于遗传编程等搜索算法中的方程提议者角色。本文中,我们提出了SR-Scientist框架,将LLM从简单的方程提议者提升为自主的AI科学家,能够编写代码分析数据、将方程实现为代码、提交评估,并根据实验反馈优化方程。具体而言,我们将代码解释器封装成一套用于数据分析和方程评估的工具集。该智能体被指导在长时间跨度内利用这些工具优化方程,尽量减少人为定义的流程。实证结果表明,在涵盖四个科学领域的数据集上,SR-Scientist以6%至35%的绝对优势超越了基线方法。此外,我们展示了该方法对噪声的鲁棒性、所发现方程对域外数据的泛化能力及其符号准确性。更进一步,我们开发了一个端到端的强化学习框架,以增强智能体的能力。
English
Recently, Large Language Models (LLMs) have been applied to scientific
equation discovery, leveraging their embedded scientific knowledge for
hypothesis generation. However, current methods typically confine LLMs to the
role of an equation proposer within search algorithms like genetic programming.
In this paper, we present SR-Scientist, a framework that elevates the LLM from
a simple equation proposer to an autonomous AI scientist that writes code to
analyze data, implements the equation as code, submits it for evaluation, and
optimizes the equation based on experimental feedback. Specifically, we wrap
the code interpreter into a set of tools for data analysis and equation
evaluation. The agent is instructed to optimize the equation by utilizing these
tools over a long horizon with minimal human-defined pipelines. Empirical
results show that SR-Scientist outperforms baseline methods by an absolute
margin of 6% to 35% on datasets covering four science disciplines.
Additionally, we demonstrate our method's robustness to noise, the
generalization of the discovered equations to out-of-domain data, and their
symbolic accuracy. Furthermore, we develop an end-to-end reinforcement learning
framework to enhance the agent's capabilities.