AutoMind:自動化數據科學的自適應知識型代理
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
June 12, 2025
作者: Yixin Ou, Yujie Luo, Jingsheng Zheng, Lanning Wei, Shuofei Qiao, Jintian Zhang, Da Zheng, Huajun Chen, Ningyu Zhang
cs.AI
摘要
大型語言模型(LLM)代理在解決現實世界數據科學問題方面展現出巨大潛力。LLM驅動的數據科學代理有望自動化整個機器學習流程,然而其實際效能仍受限。現有框架依賴於僵化、預定義的工作流程及缺乏靈活性的編碼策略;因此,它們僅能在相對簡單、經典的問題上表現出色,而無法捕捉人類實踐者在處理複雜創新任務時所具備的經驗智慧。本研究提出AutoMind,一種具備適應性與知識性的LLM代理框架,通過三大關鍵創新克服上述不足:(1)一個精心策劃的專家知識庫,使代理紮根於領域專家知識;(2)一種代理知識樹搜索算法,策略性地探索可能的解決方案;(3)一種自適應編碼策略,動態調整代碼生成以適應任務複雜度。在兩項自動化數據科學基準測試中的評估顯示,AutoMind相較於現有最先進的基線方法,提供了更優異的性能。進一步分析證實了其在效能、效率及解決方案質量上的優勢,凸顯AutoMind作為邁向全自動化數據科學的高效且穩健的一步。
English
Large Language Model (LLM) agents have shown great potential in addressing
real-world data science problems. LLM-driven data science agents promise to
automate the entire machine learning pipeline, yet their real-world
effectiveness remains limited. Existing frameworks depend on rigid, pre-defined
workflows and inflexible coding strategies; consequently, they excel only on
relatively simple, classical problems and fail to capture the empirical
expertise that human practitioners bring to complex, innovative tasks. In this
work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework
that overcomes these deficiencies through three key advances: (1) a curated
expert knowledge base that grounds the agent in domain expert knowledge, (2) an
agentic knowledgeable tree search algorithm that strategically explores
possible solutions, and (3) a self-adaptive coding strategy that dynamically
tailors code generation to task complexity. Evaluations on two automated data
science benchmarks demonstrate that AutoMind delivers superior performance
versus state-of-the-art baselines. Additional analyses confirm favorable
effectiveness, efficiency, and qualitative solution quality, highlighting
AutoMind as an efficient and robust step toward fully automated data science.