ChatPaper.aiChatPaper

EvoDS:具有技能学习和上下文管理的自我进化自主数据科学智能体

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

June 2, 2026
作者: Zherui Yang, Fan Liu, Yansong Ning, Hao Liu
cs.AI

摘要

大语言模型(LLM)智能体的最新进展已推动自动化数据科学取得显著突破。然而,现有方法仍受限于静态动作集和缺乏原则性的长程上下文管理,这阻碍了它们在多阶段迭代式数据科学流程中积累可复用经验并可靠运行的能力。为应对这些挑战,我们提出EvoDS——一种通过智能体强化学习实现技能自扩展与长程上下文自适应管理的自进化自主数据科学智能体。具体而言,EvoDS引入两项关键策略:(1)自主技能获取(ASA)机制,使智能体能够合成、验证并复用可执行技能;(2)自适应上下文压缩(ACC)策略,将上下文管理视为一个可学习的控制问题而非被动截断。这些策略通过两阶段多智能体训练方案协同运作,使EvoDS能够随时间自主提升。理论上,我们证明EvoDS的层级化设计降低了工具选择误差,其优化目标与信息瓶颈原则一致,确保上下文的高效利用。实验表明,EvoDS在四个多样化基准测试中平均性能优于最先进的开源数据科学智能体28.9%,同时完全消除了令牌溢出故障。我们的代码与数据已开源:https://github.com/usail-hkust/EvoDS。
English
Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science. However, existing approaches remain fundamentally limited by their static action sets and lack of principled long-horizon context management, hindering their ability to accumulate reusable experience across tasks and operate reliably in multi-stage, iterative data science pipelines. To address these challenges, we introduce EvoDS, a self-evolving autonomous data science agent that learns to expand its skills and adaptively managing long-term context through agentic reinforcement learning. Specifically, EvoDS introduces two key strategies: (1) Autonomous Skill Acquisition (ASA) mechanism, which enables agents to synthesize, validate, and reuse executable skills; and (2) Adaptive Context Compression (ACC) strategy, which treats context management as a learned control problem rather than passive truncation. These strategies are orchestrated within a two-stage multi-agent training scheme, enabling EvoDS to autonomously improve over time. Theoretically, we prove that EvoDS's hierarchical design reduces tool-selection error, and its optimization objective aligns with an information bottleneck principle, ensuring efficient context use. Empirically, EvoDS outperforms state-of-the-art open-source data science agents by an average of 28.9% across four diverse benchmarks while eliminating out-of-token failures. Our code and data are available at https://github.com/usail-hkust/EvoDS.