MARS:具备反思性搜索的模块化智能体,赋能自动化人工智能研究
MARS: Modular Agent with Reflective Search for Automated AI Research
February 2, 2026
作者: Jiefeng Chen, Bhavana Dalvi Mishra, Jaehyun Nam, Rui Meng, Tomas Pfister, Jinsung Yoon
cs.AI
摘要
自动化AI研究与通用软件工程存在显著差异,主要体现在计算密集型的评估过程(如模型训练)和难以追溯的性能归因。当前基于大语言模型的智能体在此领域表现欠佳,往往生成忽视执行成本与因果关系的单一脚本。我们提出MARS(具备反思搜索的模块化智能体)——专为自主AI研究优化的框架。该框架依托三大支柱:(1)通过成本约束的蒙特卡洛树搜索实现预算感知规划,显式平衡性能与执行开销;(2)采用“设计-分解-实现”流水线的模块化构建,有效管理复杂研究代码库;(3)比较式反思记忆机制,通过分析解决方案差异提炼高价值洞见,解决功劳分配难题。在同等设置下,MARS在MLE-Bench上实现了开源框架中的最先进性能,与全球排行榜顶尖方法保持竞争力。此外,系统展现出质变的“顿悟”时刻——63%的有效经验源自跨分支迁移,这表明智能体能够有效实现搜索路径间的认知泛化。
English
Automating AI research differs from general software engineering due to computationally expensive evaluation (e.g., model training) and opaque performance attribution. Current LLM-based agents struggle here, often generating monolithic scripts that ignore execution costs and causal factors. We introduce MARS (Modular Agent with Reflective Search), a framework optimized for autonomous AI research. MARS relies on three pillars: (1) Budget-Aware Planning via cost-constrained Monte Carlo Tree Search (MCTS) to explicitly balance performance with execution expense; (2) Modular Construction, employing a "Design-Decompose-Implement" pipeline to manage complex research repositories; and (3) Comparative Reflective Memory, which addresses credit assignment by analyzing solution differences to distill high-signal insights. MARS achieves state-of-the-art performance among open-source frameworks on MLE-Bench under comparable settings, maintaining competitiveness with the global leaderboard's top methods. Furthermore, the system exhibits qualitative "Aha!" moments, where 63% of all utilized lessons originate from cross-branch transfer, demonstrating that the agent effectively generalizes insights across search paths.