轨迹转技能：将局部轨迹经验提炼为可迁移的智能体技能

摘要

为大型语言模型（LLM）智能体配备领域专用技能对处理复杂任务至关重要。然而，手动编写技能存在严重的可扩展性瓶颈。相反，自动化技能生成常因依赖浅层参数化知识或对不可泛化的轨迹局部经验进行序列化过拟合，导致生成脆弱或碎片化的结果。为此，我们提出Trace2Skill框架，其模仿人类专家编写技能的方式：通过整体分析广泛执行经验后，将其提炼为单一综合性指南。该框架不再对单个轨迹进行序列化响应，而是并行调度子智能体群分析多样化执行轨迹，提取轨迹特定经验，并通过归纳推理将其分层整合为统一无冲突的技能目录。Trace2Skill既支持深化现有人工编写技能，也能实现从零创建新技能。在电子表格、视觉问答和数学推理等挑战性领域的实验表明，Trace2Skill显著超越了包括Anthropic官方xlsx技能在内的强基线。关键的是，这种基于轨迹的演化并非简单记忆任务实例或模型特定偏差：演化后的技能可跨LLM规模迁移，并能泛化至分布外场景。例如，由Qwen3.5-35B基于自身轨迹演化的技能，使Qwen3.5-122B智能体在WikiTableQuestions上的性能提升达57.65个绝对百分点。最终结果表明，复杂智能体经验可被封装为高可迁移的声明式技能——无需参数更新、无需外部检索模块，且仅需350亿参数的开源模型即可实现。

English

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or sequentially overfits to non-generalizable trajectory-local lessons. To overcome this, we introduce Trace2Skill, a framework that mirrors how human experts author skills: by holistically analyzing broad execution experience before distilling it into a single, comprehensive guide. Instead of reacting sequentially to individual trajectories, Trace2Skill dispatches a parallel fleet of sub-agents to analyze a diverse pool of executions. It extracts trajectory-specific lessons and hierarchically consolidates them into a unified, conflict-free skill directory via inductive reasoning. Trace2Skill supports both deepening existing human-written skills and creating new ones from scratch. Experiments in challenging domains, such as spreadsheet, VisionQA and math reasoning, show that Trace2Skill significantly improves upon strong baselines, including Anthropic's official xlsx skills. Crucially, this trajectory-grounded evolution does not merely memorize task instances or model-specific quirks: evolved skills transfer across LLM scales and generalize to OOD settings. For example, skills evolved by Qwen3.5-35B on its own trajectories improved a Qwen3.5-122B agent by up to 57.65 absolute percentage points on WikiTableQuestions. Ultimately, our results demonstrate that complex agent experience can be packaged into highly transferable, declarative skills -- requiring no parameter updates, no external retrieval modules, and utilizing open-source models as small as 35B parameters.