SkillAdaptor：從軌跡中為LLM智能體自我適應的技能

摘要

大型語言模型（LLM）智能體日益依賴可重複使用的外部技能來解決長程互動任務。現有無需訓練的技能適應流程通常從完整軌跡或會話級回饋中更新技能，這使得失敗歸因粗略，往往產生不穩定或過度寬泛的修正。我們提出 SkillAdaptor，一種具備明確失敗歸因的無需訓練步驟級技能適應框架，可嵌入 OpenClaw 類智能體框架。給定一條失敗軌跡時，SkillAdaptor 會識別第一個可操作的錯誤步驟，將責任關聯至候選技能，並在明確的接受檢查下執行有針對性的更新，同時保持主幹模型凍結。我們在 WebShop、PinchBench 和 Claw-Eval 上使用 Kimi-K2.5、GLM-5 和 GPT-5.2 進行評估。SkillAdaptor 在三組測試套件上均優於無技能與技能適應基線，其中最大單一指標提升分別為：PinchBench 平均分數% 提升 +1.5 點、Claw-Eval 平均分數提升 +1.8 點、WebShop 成功率提升 +1.7 點。這些結果表明步驟級歸因有助於實現更穩定且可審計的無需訓練技能維護。程式碼將於 https://github.com/zjunlp/SkillAdaptor 發布。

English

Large language model (LLM) agents increasingly rely on reusable external skills to solve long-horizon interactive tasks. Existing training-free skill adaptation pipelines usually update skills from full trajectories or session-level feedback, which makes failure attribution coarse and often produces unstable or overly broad revisions. We propose SkillAdaptor, a training-free step-level skill adaptation framework with explicit failure attribution, and it can plug into OpenClaw-class agent harnesses. Given a failed trajectory, SkillAdaptor identifies a first actionable fault step, links responsibility to candidate skills, and applies targeted updates under explicit acceptance checks while keeping the backbone frozen. We evaluate on WebShop, PinchBench, and Claw-Eval with Kimi-K2.5, GLM-5, and GPT-5.2. SkillAdaptor improves over no-skill and skill-adaptation baselines on all three suites, with the largest single-metric improvements of +1.5 points on PinchBench Avg Score%, +1.8 on Claw-Eval Avg Score, and +1.7 on WebShop success rate. These results indicate that step-level attribution supports more stable and auditable training-free skill maintenanceThe code will be released at https://github.com/zjunlp/SkillAdaptor..