EvoDS: 기술 학습 및 맥락 관리를 갖춘 자체 진화형 자율 데이터 과학 에이전트

초록

대규모 언어 모델(LLM) 에이전트의 최근 발전은 자동화된 데이터 과학 분야에서 유망한 진전을 가능하게 했다. 그러나 기존 접근 방식은 정적인 행동 집합과 원칙적인 장기 컨텍스트 관리의 부재로 인해 근본적으로 제한되어 있으며, 이는 작업 간 재사용 가능한 경험을 축적하고 다단계 반복적 데이터 과학 파이프라인에서 안정적으로 작동하는 능력을 저해한다. 이러한 문제를 해결하기 위해, 우리는 에이전틱 강화 학습을 통해 기술을 확장하고 장기 컨텍스트를 적응적으로 관리하는 방법을 학습하는 자체 진화형 자율 데이터 과학 에이전트인 EvoDS를 소개한다. 구체적으로, EvoDS는 두 가지 핵심 전략을 도입한다: (1) 에이전트가 실행 가능한 기술을 합성, 검증 및 재사용할 수 있게 하는 자율 기술 습득(ASA) 메커니즘, (2) 컨텍스트 관리를 수동적 잘라내기가 아닌 학습된 제어 문제로 다루는 적응형 컨텍스트 압축(ACC) 전략. 이러한 전략들은 2단계 다중 에이전트 훈련 방식 내에서 조정되며, 이를 통해 EvoDS는 시간이 지남에 따라 자율적으로 개선될 수 있다. 이론적으로, 우리는 EvoDS의 계층적 설계가 도구 선택 오류를 줄이고, 최적화 목표가 정보 병목 원리와 일치하여 효율적인 컨텍스트 사용을 보장함을 증명한다. 실증적으로, EvoDS는 네 가지 다양한 벤치마크에서 최첨단 오픈 소스 데이터 과학 에이전트보다 평균 28.9% 더 뛰어난 성능을 보이며, 토큰 초과 실패를 제거한다. 우리의 코드와 데이터는 https://github.com/usail-hkust/EvoDS에서 확인할 수 있다.

English

Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science. However, existing approaches remain fundamentally limited by their static action sets and lack of principled long-horizon context management, hindering their ability to accumulate reusable experience across tasks and operate reliably in multi-stage, iterative data science pipelines. To address these challenges, we introduce EvoDS, a self-evolving autonomous data science agent that learns to expand its skills and adaptively managing long-term context through agentic reinforcement learning. Specifically, EvoDS introduces two key strategies: (1) Autonomous Skill Acquisition (ASA) mechanism, which enables agents to synthesize, validate, and reuse executable skills; and (2) Adaptive Context Compression (ACC) strategy, which treats context management as a learned control problem rather than passive truncation. These strategies are orchestrated within a two-stage multi-agent training scheme, enabling EvoDS to autonomously improve over time. Theoretically, we prove that EvoDS's hierarchical design reduces tool-selection error, and its optimization objective aligns with an information bottleneck principle, ensuring efficient context use. Empirically, EvoDS outperforms state-of-the-art open-source data science agents by an average of 28.9% across four diverse benchmarks while eliminating out-of-token failures. Our code and data are available at https://github.com/usail-hkust/EvoDS.