daVinci-Dev: ソフトウェアエンジニアリングのためのエージェントネイティブな中間訓練

要旨

近年、大規模言語モデル（LLM）の能力フロンティアは、単一ターンのコード生成から、エージェント型ソフトウェアエンジニアリングへと移行しつつあります。これは、モデルが複雑なリポジトリを自律的にナビゲート、編集、テストするパラダイムです。コードエージェントにおいて事実上の標準アプローチとなっている学習後手法に対し、**エージェント型学習中訓練（agentic mid-training）**—本物のエージェントワークフローを模倣した大規模データによる学習中訓練（MT）—は、強化学習のみに依存するよりも基礎的なエージェント行動をスケーラブルに習得させる道筋を提供するにもかかわらず、多大なリソースを要するため、依然として十分に研究されていません。効果的なエージェント型学習中訓練を実現する上での核心的な課題は、静的な訓練データと、実際の開発環境における動的でフィードバックに富む環境との間の分布ミスマッチです。この問題に対処するため、我々はエージェント型学習中訓練に関する体系的な研究を提示し、大規模なエージェント開発のためのデータ合成原理と訓練方法論を確立します。我々のアプローチの中核をなすのは、**エージェントネイティブデータ**—2つの相補的な軌跡タイプから成る教師信号です：**文脈的ネイティブ軌跡**は、エージェントが経験する完全な情報の流れを保持し、広範なカバレッジと多様性を提供します。そして、**環境的ネイティブ軌跡**は、観測が実際のツール呼び出しとテスト実行に由来する実行可能リポジトリから収集され、深みと相互作用の真正性を提供します。我々は、モデルのエージェント能力を`SWE-Bench Verified`で検証します。同一のベースモデルとエージェントスキャフォールドを用いた2つの学習後設定において、従来のオープンなソフトウェアエンジニアリング学習中訓練レシピである`Kimi-Dev`に対して、学習中訓練トークン数（73.1B）を半分以下に抑えつつ優位性を示します。相対的な優位性に加えて、我々の最高性能の32Bおよび72Bモデルは、それぞれ**56.1%** および**58.5%** という解決率を達成しており、これは…

English

Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and test complex repositories. While post-training methods have become the de facto approach for code agents, **agentic mid-training**-mid-training (MT) on large-scale data that mirrors authentic agentic workflows-remains critically underexplored due to substantial resource requirements, despite offering a more scalable path to instilling foundational agentic behaviors than relying solely on expensive reinforcement learning. A central challenge in realizing effective agentic mid-training is the distribution mismatch between static training data and the dynamic, feedback-rich environment of real development. To address this, we present a systematic study of agentic mid-training, establishing both the data synthesis principles and training methodology for effective agent development at scale. Central to our approach is **agent-native data**-supervision comprising two complementary types of trajectories: **contextually-native trajectories** that preserve the complete information flow an agent experiences, offering broad coverage and diversity; and **environmentally-native trajectories** collected from executable repositories where observations stem from actual tool invocations and test executions, providing depth and interaction authenticity. We verify the model's agentic capabilities on `SWE-Bench Verified`. We demonstrate our superiority over the previous open software engineering mid-training recipe `Kimi-Dev` under two post-training settings with an aligned base model and agentic scaffold, while using less than half mid-training tokens (73.1B). Besides relative advantage, our best performing 32B and 72B models achieve **56.1%** and **58.5%** resolution rates, respectively, which are ...

daVinci-Dev: ソフトウェアエンジニアリングのためのエージェントネイティブな中間訓練

daVinci-Dev: Agent-native Mid-training for Software Engineering

要旨

Support