運用技能程序駕馭LLM代理

摘要

為LLM代理配備從過去經驗中提煉出的可重複使用技能，已成為處理複雜且長期任務的一種熱門且有效的方法。然而，這類經驗教訓通常以文字指導的形式編碼，大多僅具建議性，缺乏在代理循環中何時及如何介入的明確機制。為填補此缺口，我們提出HASP（Harnessing LLM Agents with Skill Programs）框架，將技能升級為可執行的程式功能（Program Functions, PFs）。與提供被動建議不同，PF在易出錯的狀態下啟動，作為可執行的護欄，修改下一步行動或注入修正性上下文。HASP具有高度模組化特性：可在推理階段直接應用於代理循環的即時干預，在訓練後期提供結構化監督，或透過演化經驗證與教師審核的PF來實現自我改進。實驗結果顯示，無論是相較於免訓練還是基於訓練的方法，HASP在網路搜尋、數學推理與程式碼生成任務上均帶來顯著提升。例如，在網路搜尋推理任務中，僅使用推理階段的PF即可使平均表現比（多輪）ReAct代理提升25%，而訓練後期與受控演化則相較於Search-R1取得30.4%的增益。為深入探討HASP的運作機制，我們的機制分析揭示了PF如何觸發與干預、技能如何被內化，以及穩定技能庫演化的必要條件。

English

Equipping LLM agents with reusable skills derived from past experience has become a popular and successful approach for tackling complex and long-horizon tasks. However, such lessons are often encoded as textual guidance that remains largely advisory, lacking explicit mechanisms for when and how to intervene in the agent loop. To bridge the gap, we introduce HASP(Harnessing LLM Agents with Skill Programs), a new framework that upgrades skills into executable Program Functions (PFs). Rather than offering passive advice, PFs act as executable guardrails that activate on failure-prone states and modify the next action or inject corrective context. HASP is highly modular: it can be applied at inference time for direct agent-loop intervention, during post-training to provide structured supervision, or for self-improvement by evolving validated, teacher-reviewed PFs. Empirically, HASP drives substantial gains compared to both training-free and training-based methods on web-search, math reasoning, and coding tasks. For example, on web-search reasoning, inference-time PFs alone improve the average performance by 25% compared to (multi-loop) ReAct Agent, while post-training and controlled evolution achieve a 30.4% gain over Search-R1. To provide deeper insights into HASP, our mechanism analysis reveals how PFs trigger and intervene, how skills are internalized, and the requirement for stable skill library evolution.