MARS：具備反思性搜尋功能的模組化代理，用於自動化人工智慧研究

摘要

自動化人工智慧研究與一般軟體工程存在本質差異，其特點在於計算密集型的評估流程（例如模型訓練）與難以溯因的效能歸屬問題。當前基於大型語言模型的智慧體在此領域表現不佳，常生成忽略執行成本與因果關係的單體式腳本。我們提出MARS（具備反思式搜尋的模組化智慧體）框架，專為自主AI研究優化設計。該框架立足三大支柱：（1）透過成本受限的蒙地卡羅樹搜尋實現預算感知規劃，明確權衡效能與執行開銷；（2）採用「設計-分解-實作」管線的模組化建構策略，有效管理複雜研究程式庫；（3）比較式反思記憶機制，透過分析解決方案差異來提煉高價值洞見，解決功勞分配難題。在可比設定下，MARS於MLE-Bench開源框架中實現頂尖性能，與全球排行榜的領先方法保持競爭力。更值得注意的是，系統展現出質性化的「頓悟」時刻——所有被採用的經驗中有63%源自跨分支遷移，這證明該智慧體能有效實現搜尋路徑間的洞察泛化。

English

Automating AI research differs from general software engineering due to computationally expensive evaluation (e.g., model training) and opaque performance attribution. Current LLM-based agents struggle here, often generating monolithic scripts that ignore execution costs and causal factors. We introduce MARS (Modular Agent with Reflective Search), a framework optimized for autonomous AI research. MARS relies on three pillars: (1) Budget-Aware Planning via cost-constrained Monte Carlo Tree Search (MCTS) to explicitly balance performance with execution expense; (2) Modular Construction, employing a "Design-Decompose-Implement" pipeline to manage complex research repositories; and (3) Comparative Reflective Memory, which addresses credit assignment by analyzing solution differences to distill high-signal insights. MARS achieves state-of-the-art performance among open-source frameworks on MLE-Bench under comparable settings, maintaining competitiveness with the global leaderboard's top methods. Furthermore, the system exhibits qualitative "Aha!" moments, where 63% of all utilized lessons originate from cross-branch transfer, demonstrating that the agent effectively generalizes insights across search paths.

MARS：具備反思性搜尋功能的模組化代理，用於自動化人工智慧研究

MARS: Modular Agent with Reflective Search for Automated AI Research

摘要

Support