AutoMind：面向自动化数据科学的自适应知识型智能体

摘要

大型语言模型（LLM）代理在解决现实世界数据科学问题方面展现出巨大潜力。LLM驱动的数据科学代理有望实现整个机器学习流程的自动化，然而其实际应用效果仍显局限。现有框架依赖于僵化的预定义工作流和缺乏灵活性的编码策略，因此仅在处理相对简单、经典的问题时表现优异，而无法捕捉人类从业者在复杂创新任务中积累的实践经验。本研究提出AutoMind，一种自适应、知识丰富的LLM代理框架，通过三大关键创新克服上述不足：（1）构建精选的专家知识库，使代理扎根于领域专家知识；（2）采用代理知识树搜索算法，策略性地探索可能解决方案；（3）实施自适应的编码策略，根据任务复杂度动态调整代码生成。在两个自动化数据科学基准测试上的评估表明，AutoMind相较于最先进的基线方法展现出更优性能。进一步分析证实了其在有效性、效率及解决方案质量上的优势，凸显AutoMind作为迈向全自动化数据科学的高效且稳健的一步。

English

Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science.

AutoMind：面向自动化数据科学的自适应知识型智能体

AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

摘要

Support