ChatPaper.aiChatPaper

daVinci-Dev:面向軟體工程的代理原生中期訓練

daVinci-Dev: Agent-native Mid-training for Software Engineering

January 26, 2026
作者: Ji Zeng, Dayuan Fu, Tiantian Mi, Yumin Zhuang, Yaxing Huang, Xuefeng Li, Lyumanshan Ye, Muhang Xie, Qishuo Hua, Zhen Huang, Mohan Jiang, Hanning Wang, Jifan Lin, Yang Xiao, Jie Sun, Yunze Wu, Pengfei Liu
cs.AI

摘要

近期,大型語言模型(LLM)的能力前沿已從單輪程式碼生成轉向代理式軟體工程——這種範式下模型能自主導航、編輯和測試複雜程式庫。雖然後訓練方法已成為程式碼代理的事實標準,但**代理式中訓練**(在模擬真實代理工作流程的大規模數據上進行中訓練)由於巨大的資源需求仍未被充分探索,儘管相比僅依賴昂貴的強化學習,它為注入基礎代理行為提供了更具擴展性的路徑。實現有效代理式中訓練的核心挑戰在於靜態訓練數據與真實開發中動態、富含回饋環境之間的分布不匹配。為此,我們提出對代理式中訓練的系統性研究,建立了大規模有效代理開發的數據合成原則與訓練方法論。我們方法的關鍵在於**代理原生數據**——包含兩種互補軌跡的監督信號:**上下文原生軌跡**保留代理經歷的完整信息流,提供廣泛覆蓋度與多樣性;**環境原生軌跡**則從可執行程式庫收集,其觀測源自實際工具調用與測試執行,確保交互深度與真實性。我們在`SWE-Bench Verified`上驗證模型的代理能力,結果顯示:在採用對齊基礎模型與代理框架的兩種後訓練設定下,我們的方案以不到一半的中訓練詞元量(731億)優於先前開源軟體工程中訓練方案`Kimi-Dev`。除相對優勢外,我們表現最佳的320億與720億參數模型分別達到**56.1%**與**58.5%**的問題解決率,其表現...
English
Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and test complex repositories. While post-training methods have become the de facto approach for code agents, **agentic mid-training**-mid-training (MT) on large-scale data that mirrors authentic agentic workflows-remains critically underexplored due to substantial resource requirements, despite offering a more scalable path to instilling foundational agentic behaviors than relying solely on expensive reinforcement learning. A central challenge in realizing effective agentic mid-training is the distribution mismatch between static training data and the dynamic, feedback-rich environment of real development. To address this, we present a systematic study of agentic mid-training, establishing both the data synthesis principles and training methodology for effective agent development at scale. Central to our approach is **agent-native data**-supervision comprising two complementary types of trajectories: **contextually-native trajectories** that preserve the complete information flow an agent experiences, offering broad coverage and diversity; and **environmentally-native trajectories** collected from executable repositories where observations stem from actual tool invocations and test executions, providing depth and interaction authenticity. We verify the model's agentic capabilities on `SWE-Bench Verified`. We demonstrate our superiority over the previous open software engineering mid-training recipe `Kimi-Dev` under two post-training settings with an aligned base model and agentic scaffold, while using less than half mid-training tokens (73.1B). Besides relative advantage, our best performing 32B and 72B models achieve **56.1%** and **58.5%** resolution rates, respectively, which are ...
PDF1042January 28, 2026