ChatPaper.aiChatPaper

daVinci-Agency:以資料高效方式解鎖長視野智慧體能力

daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

February 2, 2026
作者: Mohan Jiang, Dayuan Fu, Junhao Shi, Ji Zeng, Weiye Si, Keyu Li, Xuefeng Li, Yang Xiao, Wenjie Li, Dequan Wang, Pengfei Liu
cs.AI

摘要

儘管大型語言模型在短期任務上表現卓越,但將其擴展至長時序智能體工作流仍面臨挑戰。核心瓶頸在於缺乏能捕捉真實長程依賴結構與跨階段演化動態的訓練數據——現有合成方法要麼受限於模型分佈而侷限於單一特徵場景,要麼需要耗費高昂的人工標註成本,無法提供可擴展的高質量監督信號。我們通過重構軟體演化視角下的數據合成方法來解決這一問題。關鍵洞察在於:拉取請求序列天然具備長時序學習所需的監督信號。它們將複雜目標分解為可驗證的提交單元,在迭代中保持功能連貫性,並通過錯誤修復歷史編碼真實的改進模式。基於此,我們提出daVinci-Agency框架,通過三個互鎖機制從PR鏈中系統化挖掘結構化監督信號:(1) 通過連續提交實現漸進式任務分解,(2) 經由統一功能目標實施長期一致性約束,(3) 從真實錯誤修復軌跡中提取可驗證的改進路徑。有別於將各步驟獨立處理的合成軌跡,daVinci-Agency基於PR的結構本質上保留了因果依賴關係與迭代優化過程,這對培養持續性目標導向行為至關重要,並能自然對齊專案級全週期任務建模。生成的軌跡規模龐大——平均達85k個標記和116次工具調用——卻具有顯著的數據效率:使用239個daVinci-Agency樣本對GLM-4.6進行微調後,在多項基準測試中實現廣泛提升,尤其在Toolathlon上取得47%的相對增益。除基準性能外,我們的分析進一步證實...
English
While Large Language Models (LLMs) excel at short-term tasks, scaling them to long-horizon agentic workflows remains challenging. The core bottleneck lies in the scarcity of training data that captures authentic long-dependency structures and cross-stage evolutionary dynamics--existing synthesis methods either confine to single-feature scenarios constrained by model distribution, or incur prohibitive human annotation costs, failing to provide scalable, high-quality supervision. We address this by reconceptualizing data synthesis through the lens of real-world software evolution. Our key insight: Pull Request (PR) sequences naturally embody the supervision signals for long-horizon learning. They decompose complex objectives into verifiable submission units, maintain functional coherence across iterations, and encode authentic refinement patterns through bug-fix histories. Building on this, we propose daVinci-Agency, which systematically mines structured supervision from chain-of-PRs through three interlocking mechanisms: (1) progressive task decomposition via continuous commits, (2) long-term consistency enforcement through unified functional objectives, and (3) verifiable refinement from authentic bug-fix trajectories. Unlike synthetic trajectories that treat each step independently, daVinci-Agency's PR-grounded structure inherently preserves the causal dependencies and iterative refinements essential for teaching persistent goal-directed behavior and enables natural alignment with project-level, full-cycle task modeling. The resulting trajectories are substantial--averaging 85k tokens and 116 tool calls--yet remarkably data-efficient: fine-tuning GLM-4.6 on 239 daVinci-Agency samples yields broad improvements across benchmarks, notably achieving a 47% relative gain on Toolathlon. Beyond benchmark performance, our analysis confirms...
PDF431February 5, 2026