EvoDS：スキル学習とコンテキスト管理を備えた自己進化型自律データサイエンスエージェント

要旨

大規模言語モデル（LLM）エージェントの最近の進展により、自動化されたデータサイエンスにおける有望な進歩が可能となった。しかし、既存のアプローチは静的アクションセットに根本的に制限されており、原理に基づいた長期依存関係のコンテキスト管理が欠如しているため、タスク間で再利用可能な経験を蓄積し、多段階・反復的なデータサイエンスパイプラインで確実に動作する能力が妨げられている。これらの課題に対処するため、我々はEvoDSを提案する。これは、エージェント強化学習を通じてスキルを拡張し、長期的なコンテキストを適応的に管理することを学習する自己進化型自律データサイエンスエージェントである。具体的には、EvoDSは以下の2つの主要戦略を導入する。（1）自律的スキル獲得（ASA）機構：エージェントが実行可能なスキルを合成、検証、再利用することを可能にする。（2）適応的コンテキスト圧縮（ACC）戦略：コンテキスト管理を受動的な切り捨てではなく学習された制御問題として扱う。これらの戦略は2段階マルチエージェント訓練スキーム内で調整され、EvoDSが時間とともに自律的に改善することを可能にする。理論的には、EvoDSの階層的設計がツール選択エラーを低減し、その最適化目標が情報ボトルネック原理と一致することで、効率的なコンテキスト利用を保証することを証明する。実証的には、EvoDSは4つの多様なベンチマークにおいて、最先端のオープンソースデータサイエンスエージェントを平均28.9%上回り、トークン超過エラーを排除する。我々のコードとデータはhttps://github.com/usail-hkust/EvoDSで公開されている。

English

Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science. However, existing approaches remain fundamentally limited by their static action sets and lack of principled long-horizon context management, hindering their ability to accumulate reusable experience across tasks and operate reliably in multi-stage, iterative data science pipelines. To address these challenges, we introduce EvoDS, a self-evolving autonomous data science agent that learns to expand its skills and adaptively managing long-term context through agentic reinforcement learning. Specifically, EvoDS introduces two key strategies: (1) Autonomous Skill Acquisition (ASA) mechanism, which enables agents to synthesize, validate, and reuse executable skills; and (2) Adaptive Context Compression (ACC) strategy, which treats context management as a learned control problem rather than passive truncation. These strategies are orchestrated within a two-stage multi-agent training scheme, enabling EvoDS to autonomously improve over time. Theoretically, we prove that EvoDS's hierarchical design reduces tool-selection error, and its optimization objective aligns with an information bottleneck principle, ensuring efficient context use. Empirically, EvoDS outperforms state-of-the-art open-source data science agents by an average of 28.9% across four diverse benchmarks while eliminating out-of-token failures. Our code and data are available at https://github.com/usail-hkust/EvoDS.