daVinci-Dev: 소프트웨어 공학을 위한 에이전트 네이티브 미드-트레이닝

초록

최근 대규모 언어 모델(LLM) 역량의 최전선은 단일 회차 코드 생성에서 에이전트형 소프트웨어 엔지니어링, 즉 모델이 복잡한 저장소를 자율적으로 탐색, 편집, 테스트하는 패러다임으로 이동했습니다. 사후 훈련 방법이 코드 에이전트의 사실상 표준 접근법이 되었지만, **에이전트형 중간 훈련**—진정한 에이전트 워크플로를 반영한 대규모 데이터에 대한 중간 훈련(MT)—은 확장성 있는 방식으로 기초적인 에이전트 행동을 함양할 수 있는 길을 제공함에도 불구하고, 상당한 자원 요구 사항으로 인해 여전히 심각하게 탐구되지 않고 있습니다. 효과적인 에이전트형 중간 훈련을 실현하는 데 있어 핵심적인 과제는 정적 훈련 데이터와 실제 개발의 동적이며 피드백이 풍부한 환경 간의 분포 불일치입니다. 이를 해결하기 위해 우리는 에이전트형 중간 훈련에 대한 체계적인 연구를 제시하며, 대규모 효과적 에이전트 개발을 위한 데이터 합성 원칙과 훈련 방법론을 정립합니다. 우리 접근법의 핵심은 **에이전트 네이티브 데이터**—두 가지 상호 보완적인 유형의 궤적으로 구성된 지도 학습입니다: 에이전트가 경험하는 완전한 정보 흐름을 보존하여 광범위한 커버리지와 다양성을 제공하는 **맥락적 네이티브 궤적**과, 실제 도구 호출 및 테스트 실행에서 비롯된 관측치를 제공하여 깊이와 상호작용의 진정성을 보장하는 실행 가능한 저장소에서 수집된 **환경적 네이티브 궤적**. 우리는 `SWE-Bench Verified`에서 모델의 에이전트 역량을 검증합니다. 우리는 정렬된 기본 모델과 에이전트 스캐폴드를 사용하는 두 가지 사후 훈련 설정 하에서 기존의 오픈 소스 소프트웨어 엔지니어링 중간 훈련 레시피인 `Kimi-Dev` 대비 우월성을 입증하며, 중간 훈련 토큰 수를 절반 미만(73.1B)으로 사용합니다. 상대적 우위 외에도, 우리의 최고 성능 32B 및 72B 모델은 각각 **56.1%** 및 **58.5%** 의 해결율을 달성하며, 이는 ...

English

Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and test complex repositories. While post-training methods have become the de facto approach for code agents, **agentic mid-training**-mid-training (MT) on large-scale data that mirrors authentic agentic workflows-remains critically underexplored due to substantial resource requirements, despite offering a more scalable path to instilling foundational agentic behaviors than relying solely on expensive reinforcement learning. A central challenge in realizing effective agentic mid-training is the distribution mismatch between static training data and the dynamic, feedback-rich environment of real development. To address this, we present a systematic study of agentic mid-training, establishing both the data synthesis principles and training methodology for effective agent development at scale. Central to our approach is **agent-native data**-supervision comprising two complementary types of trajectories: **contextually-native trajectories** that preserve the complete information flow an agent experiences, offering broad coverage and diversity; and **environmentally-native trajectories** collected from executable repositories where observations stem from actual tool invocations and test executions, providing depth and interaction authenticity. We verify the model's agentic capabilities on `SWE-Bench Verified`. We demonstrate our superiority over the previous open software engineering mid-training recipe `Kimi-Dev` under two post-training settings with an aligned base model and agentic scaffold, while using less than half mid-training tokens (73.1B). Besides relative advantage, our best performing 32B and 72B models achieve **56.1%** and **58.5%** resolution rates, respectively, which are ...

daVinci-Dev: 소프트웨어 공학을 위한 에이전트 네이티브 미드-트레이닝

daVinci-Dev: Agent-native Mid-training for Software Engineering

초록

Support