MARS：具备反思性搜索的模块化智能体，赋能自动化人工智能研究

摘要

自动化AI研究与通用软件工程存在显著差异，主要体现在计算密集型的评估过程（如模型训练）和难以追溯的性能归因。当前基于大语言模型的智能体在此领域表现欠佳，往往生成忽视执行成本与因果关系的单一脚本。我们提出MARS（具备反思搜索的模块化智能体）——专为自主AI研究优化的框架。该框架依托三大支柱：（1）通过成本约束的蒙特卡洛树搜索实现预算感知规划，显式平衡性能与执行开销；（2）采用“设计-分解-实现”流水线的模块化构建，有效管理复杂研究代码库；（3）比较式反思记忆机制，通过分析解决方案差异提炼高价值洞见，解决功劳分配难题。在同等设置下，MARS在MLE-Bench上实现了开源框架中的最先进性能，与全球排行榜顶尖方法保持竞争力。此外，系统展现出质变的“顿悟”时刻——63%的有效经验源自跨分支迁移，这表明智能体能够有效实现搜索路径间的认知泛化。

English

Automating AI research differs from general software engineering due to computationally expensive evaluation (e.g., model training) and opaque performance attribution. Current LLM-based agents struggle here, often generating monolithic scripts that ignore execution costs and causal factors. We introduce MARS (Modular Agent with Reflective Search), a framework optimized for autonomous AI research. MARS relies on three pillars: (1) Budget-Aware Planning via cost-constrained Monte Carlo Tree Search (MCTS) to explicitly balance performance with execution expense; (2) Modular Construction, employing a "Design-Decompose-Implement" pipeline to manage complex research repositories; and (3) Comparative Reflective Memory, which addresses credit assignment by analyzing solution differences to distill high-signal insights. MARS achieves state-of-the-art performance among open-source frameworks on MLE-Bench under comparable settings, maintaining competitiveness with the global leaderboard's top methods. Furthermore, the system exhibits qualitative "Aha!" moments, where 63% of all utilized lessons originate from cross-branch transfer, demonstrating that the agent effectively generalizes insights across search paths.

MARS：具备反思性搜索的模块化智能体，赋能自动化人工智能研究

MARS: Modular Agent with Reflective Search for Automated AI Research

摘要

Support