SkillGrad：像梯度下降一樣優化智能體技能

摘要

Agent技能提供了一種輕量級方式，透過將可重複使用的程序性知識儲存在結構化檔案中，來調整大型語言模型（LLM）智能體以適應專業領域。然而，無論是從第三方下載還是自行生成，這些技能往往不可靠、不完整或過時。現有的技能演化方法通常透過啟發式反思來解決這些缺陷，但缺乏明確的優化形式化。本文提出SkillGrad，一種受梯度下降啟發的框架，用於優化Agent技能。SkillGrad將技能套件視為結構化參數，以梯度下降方式進行優化：任務執行提供軌跡層級的損失證據，自動診斷則提供基於文字的梯度，指示修正方向。為穩定跨迭代的優化，一個動量智能體將重複出現的診斷模式累積到持久記憶覆蓋層中。最後，基於LLM的修補器透過對技能套件進行層級感知編輯來執行參數更新。在SpreadsheetBench Verified和WikiTableQuestions上的評估結果顯示，SkillGrad在兩個骨幹LLM上持續優於基於訓練的技能演化基線，平均比最強的基於訓練基線高出6.7個百分點。消融實驗進一步表明，動量與對比診斷均有助於最終技能品質的提升。

English

Agent skills provide a lightweight way to adapt LLM agents to specialized domains by storing reusable procedural knowledge in structured files. However, whether downloaded from third parties or self-generated, these skills are often unreliable, incomplete, or outdated. Existing skill-evolution methods often address these deficiencies through heuristic reflections without an explicit optimization formulation. In this paper, we propose SkillGrad, a gradient-descent-inspired framework for optimizing agent skills. SkillGrad treats the skill package as a structured parameter to optimize in a gradient descent fashion: task executions provide trajectory-level loss evidence, automatic diagnoses then provide text-based gradients that indicate the correction directions. To stabilize optimization across iterations, a momentum agent accumulates recurring diagnostic patterns into a persistent memory overlay. Finally, an LLM-based patcher executes the parameter update by applying layer-aware edits to the skill package. Evaluated on SpreadsheetBench Verified and WikiTableQuestions, SkillGrad consistently outperforms training-based skill evolution baselines across two backbone LLMs, improving over the strongest training-based baseline by 6.7 percentage points on average. Ablations further show that momentum and contrastive diagnosis both contribute to the final skill quality.