Trace2Skill:將軌跡局部經驗提煉為可遷移的智能體技能
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
March 26, 2026
作者: Jingwei Ni, Yihao Liu, Xinpeng Liu, Yutao Sun, Mengyu Zhou, Pengyu Cheng, Dexin Wang, Xiaoxi Jiang, Guanjun Jiang
cs.AI
摘要
為大型語言模型(LLM)智能體配備領域專屬技能是處理複雜任務的關鍵。然而,手動編寫技能會造成嚴重的可擴展性瓶頸。相反地,自動化技能生成往往因依賴淺層參數化知識或對不可泛化的軌跡局部經驗進行序列化過擬合,而產生脆弱或碎片化的結果。為解決此問題,我們提出Trace2Skill框架,其模仿人類專家編寫技能的方式:在將廣泛執行經驗提煉為單一綜合指南前,先進行整體分析。Trace2Skill不再對單個軌跡進行序列化反應,而是調度並行子智能體群組分析多元化的執行樣本,提取軌跡專屬經驗,並通過歸納推理將其分層整合為統一且無衝突的技能目錄。該框架既支持深化現有人工編寫技能,也能從零創建新技能。在電子表格、視覺問答和數學推理等高難度領域的實驗表明,Trace2Skill顯著超越了包括Anthropic官方xlsx技能在內的強基線模型。關鍵在於,這種基於軌跡的演化並非簡單記憶任務實例或模型特定偏差:演化後的技能可跨LLM規模遷移,並能泛化至分佈外(OOD)場景。例如,由Qwen3.5-35B基於自身軌跡演化的技能,使Qwen3.5-122B智能體在WikiTableQuestions上的表現提升達57.65個絕對百分點。最終,我們的結果證明複雜智能體經驗可被封裝為高可遷移的聲明式技能——無需參數更新、不依賴外部檢索模塊,且僅需350億參數規模的開源模型即可實現。
English
Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or sequentially overfits to non-generalizable trajectory-local lessons. To overcome this, we introduce Trace2Skill, a framework that mirrors how human experts author skills: by holistically analyzing broad execution experience before distilling it into a single, comprehensive guide. Instead of reacting sequentially to individual trajectories, Trace2Skill dispatches a parallel fleet of sub-agents to analyze a diverse pool of executions. It extracts trajectory-specific lessons and hierarchically consolidates them into a unified, conflict-free skill directory via inductive reasoning. Trace2Skill supports both deepening existing human-written skills and creating new ones from scratch. Experiments in challenging domains, such as spreadsheet, VisionQA and math reasoning, show that Trace2Skill significantly improves upon strong baselines, including Anthropic's official xlsx skills. Crucially, this trajectory-grounded evolution does not merely memorize task instances or model-specific quirks: evolved skills transfer across LLM scales and generalize to OOD settings. For example, skills evolved by Qwen3.5-35B on its own trajectories improved a Qwen3.5-122B agent by up to 57.65 absolute percentage points on WikiTableQuestions. Ultimately, our results demonstrate that complex agent experience can be packaged into highly transferable, declarative skills -- requiring no parameter updates, no external retrieval modules, and utilizing open-source models as small as 35B parameters.