MARS:具備反思性搜尋功能的模組化代理,用於自動化人工智慧研究
MARS: Modular Agent with Reflective Search for Automated AI Research
February 2, 2026
作者: Jiefeng Chen, Bhavana Dalvi Mishra, Jaehyun Nam, Rui Meng, Tomas Pfister, Jinsung Yoon
cs.AI
摘要
自動化人工智慧研究與一般軟體工程存在本質差異,其特點在於計算密集型的評估流程(例如模型訓練)與難以溯因的效能歸屬問題。當前基於大型語言模型的智慧體在此領域表現不佳,常生成忽略執行成本與因果關係的單體式腳本。我們提出MARS(具備反思式搜尋的模組化智慧體)框架,專為自主AI研究優化設計。該框架立足三大支柱:(1)透過成本受限的蒙地卡羅樹搜尋實現預算感知規劃,明確權衡效能與執行開銷;(2)採用「設計-分解-實作」管線的模組化建構策略,有效管理複雜研究程式庫;(3)比較式反思記憶機制,透過分析解決方案差異來提煉高價值洞見,解決功勞分配難題。在可比設定下,MARS於MLE-Bench開源框架中實現頂尖性能,與全球排行榜的領先方法保持競爭力。更值得注意的是,系統展現出質性化的「頓悟」時刻——所有被採用的經驗中有63%源自跨分支遷移,這證明該智慧體能有效實現搜尋路徑間的洞察泛化。
English
Automating AI research differs from general software engineering due to computationally expensive evaluation (e.g., model training) and opaque performance attribution. Current LLM-based agents struggle here, often generating monolithic scripts that ignore execution costs and causal factors. We introduce MARS (Modular Agent with Reflective Search), a framework optimized for autonomous AI research. MARS relies on three pillars: (1) Budget-Aware Planning via cost-constrained Monte Carlo Tree Search (MCTS) to explicitly balance performance with execution expense; (2) Modular Construction, employing a "Design-Decompose-Implement" pipeline to manage complex research repositories; and (3) Comparative Reflective Memory, which addresses credit assignment by analyzing solution differences to distill high-signal insights. MARS achieves state-of-the-art performance among open-source frameworks on MLE-Bench under comparable settings, maintaining competitiveness with the global leaderboard's top methods. Furthermore, the system exhibits qualitative "Aha!" moments, where 63% of all utilized lessons originate from cross-branch transfer, demonstrating that the agent effectively generalizes insights across search paths.