EvoDS:具備技能學習與情境管理能力的自我演化自主數據科學代理
EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management
June 2, 2026
作者: Zherui Yang, Fan Liu, Yansong Ning, Hao Liu
cs.AI
摘要
大型語言模型(LLM)代理的近來進展已在自動化資料科學領域帶來前景可觀的突破。然而,現有方法仍受限於其靜態動作集合與缺乏原則性的長程脈絡管理機制,從而阻礙了它們在跨任務中累積可重複使用的經驗,以及在多階段、迭代式資料科學管線中穩定運作的能力。為應對這些挑戰,我們提出EvoDS——一種透過代理強化學習來學習擴展自身技能、並自適應管理長期脈絡的自我演化自主資料科學代理。具體而言,EvoDS引入了兩項關鍵策略:(1)自主技能獲取(ASA)機制,使代理能夠合成、驗證並重複使用可執行技能;以及(2)自適應脈絡壓縮(ACC)策略,將脈絡管理視為一種學得的控制問題,而非被動截斷。這些策略在一個兩階段多代理訓練方案中被協調運用,使EvoDS能夠隨著時間自主改進。從理論上,我們證明了EvoDS的分層設計降低了工具選擇錯誤率,且其優化目標與資訊瓶頸原則一致,確保了脈絡的高效使用。在實證上,EvoDS在四個不同基準測試中平均比現有最先進的開源資料科學代理高出28.9%,同時完全消除了超出令牌長度的失敗。我們的程式碼與資料可於https://github.com/usail-hkust/EvoDS取得。
English
Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science. However, existing approaches remain fundamentally limited by their static action sets and lack of principled long-horizon context management, hindering their ability to accumulate reusable experience across tasks and operate reliably in multi-stage, iterative data science pipelines. To address these challenges, we introduce EvoDS, a self-evolving autonomous data science agent that learns to expand its skills and adaptively managing long-term context through agentic reinforcement learning. Specifically, EvoDS introduces two key strategies: (1) Autonomous Skill Acquisition (ASA) mechanism, which enables agents to synthesize, validate, and reuse executable skills; and (2) Adaptive Context Compression (ACC) strategy, which treats context management as a learned control problem rather than passive truncation. These strategies are orchestrated within a two-stage multi-agent training scheme, enabling EvoDS to autonomously improve over time. Theoretically, we prove that EvoDS's hierarchical design reduces tool-selection error, and its optimization objective aligns with an information bottleneck principle, ensuring efficient context use. Empirically, EvoDS outperforms state-of-the-art open-source data science agents by an average of 28.9% across four diverse benchmarks while eliminating out-of-token failures. Our code and data are available at https://github.com/usail-hkust/EvoDS.