daVinci-Agency:高效解锁长程智能体数据
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently
February 2, 2026
作者: Mohan Jiang, Dayuan Fu, Junhao Shi, Ji Zeng, Weiye Si, Keyu Li, Xuefeng Li, Yang Xiao, Wenjie Li, Dequan Wang, Pengfei Liu
cs.AI
摘要
尽管大语言模型在短期任务中表现出色,但将其扩展至长周期智能体工作流仍面临挑战。核心瓶颈在于缺乏能够捕捉真实长依赖结构和跨阶段演进动态的训练数据——现有合成方法要么受限于模型分布而局限于单一特征场景,要么需要高昂的人工标注成本,无法提供可扩展的高质量监督信号。我们通过重新构想软件真实演进过程中的数据合成机制来解决这一问题。核心洞见在于:代码拉取请求序列天然蕴含了长周期学习所需的监督信号。它们将复杂目标分解为可验证的提交单元,在迭代过程中保持功能一致性,并通过错误修复历史编码真实的优化模式。基于此,我们提出daVinci-Agency框架,通过三个联动机制从PR链中系统化挖掘结构化监督信号:(1)通过连续提交实现渐进式任务分解;(2)通过统一功能目标实施长期一致性约束;(3)基于真实错误修复轨迹进行可验证的优化。与将各步骤独立处理的合成轨迹不同,daVinci-Agency基于PR的框架本质保留了因果依赖关系和迭代优化过程,这对于培养持续目标导向行为至关重要,并能自然契合项目级全周期任务建模。生成的轨迹规模可观——平均包含8.5万词元和116次工具调用——却具有显著的数据效率:使用239个daVinci-Agency样本对GLM-4.6进行微调,在多项基准测试中实现广泛提升,尤其在Toolathlon基准上相对提升达47%。除基准性能外,我们的分析进一步证实...
English
While Large Language Models (LLMs) excel at short-term tasks, scaling them to long-horizon agentic workflows remains challenging. The core bottleneck lies in the scarcity of training data that captures authentic long-dependency structures and cross-stage evolutionary dynamics--existing synthesis methods either confine to single-feature scenarios constrained by model distribution, or incur prohibitive human annotation costs, failing to provide scalable, high-quality supervision. We address this by reconceptualizing data synthesis through the lens of real-world software evolution. Our key insight: Pull Request (PR) sequences naturally embody the supervision signals for long-horizon learning. They decompose complex objectives into verifiable submission units, maintain functional coherence across iterations, and encode authentic refinement patterns through bug-fix histories. Building on this, we propose daVinci-Agency, which systematically mines structured supervision from chain-of-PRs through three interlocking mechanisms: (1) progressive task decomposition via continuous commits, (2) long-term consistency enforcement through unified functional objectives, and (3) verifiable refinement from authentic bug-fix trajectories. Unlike synthetic trajectories that treat each step independently, daVinci-Agency's PR-grounded structure inherently preserves the causal dependencies and iterative refinements essential for teaching persistent goal-directed behavior and enables natural alignment with project-level, full-cycle task modeling. The resulting trajectories are substantial--averaging 85k tokens and 116 tool calls--yet remarkably data-efficient: fine-tuning GLM-4.6 on 239 daVinci-Agency samples yields broad improvements across benchmarks, notably achieving a 47% relative gain on Toolathlon. Beyond benchmark performance, our analysis confirms...