技能並非一體適用：LLM代理的模型感知技能對齊

摘要

LLM代理越來越常在決策時檢索外部策劃的技能（即程序性指令），以提升在長時程互動任務中的表現。現有的技能庫通常被視為與模型無關，在不同能力與行為模式顯著差異的骨幹模型上沿用相同的技能表述。然而，我們在多個模型規模下進行的受控實驗顯示，技能的有效性高度依賴於模型：一項對某個骨幹模型有益的技能，可能對另一個模型造成傷害。基於此觀察，我們提出MASA（模型感知技能對齊）框架，該框架無需調整代理權重即可針對每個目標骨幹模型適應技能。MASA分兩階段運作：(1) 分層技能演化流程，利用爬山法與基於UCB的樹搜索，並根據環境回饋及模型能力特徵，疊代改寫通用與任務特定技能；(2) 輕量級模型條件技能重寫器，此重寫器訓練於演化軌跡上，能在單次前向傳遞中重現適應過程。在三種互動環境與四個骨幹模型上的實驗顯示，MASA始終達到最佳整體表現，相較於最強基線提升了高達25.8個百分點。學習到的重寫器更進一步推廣至未見過的任務與環境，無需額外搜索，且推論成本僅為更大型教師LLM的一小部分，卻始終優於其表現。

English

LLM agents increasingly retrieve externally curated skills-procedural instructions retrieved at decision time-to improve performance on long-horizon interactive tasks. Existing skill libraries are typically treated as model-agnostic, reusing the same skill formulations across backbones with substantially different capacities and behaviors. However, our controlled experiments across multiple model scales show that skill effectiveness is strongly model-dependent: a skill that benefits one backbone can harm another. Motivated by this observation, we propose MASA Model-Aware Skill Alignment, a framework that adapts skills to each target backbone without modifying agent weights. MASA operates in two stages: (1) a hierarchical skill evolution pipeline that iteratively rewrites general and task-specific skills using hill climbing and UCB-driven tree search, guided by environment feedback and model capability profiles; and (2) a lightweight model-conditioned skill rewriter trained on evolution trajectories to reproduce the adaptation in a single forward pass. Experiments across three interactive environments and four backbones show that MASA consistently achieves the best overall performance, with gains of up to 25.8 points over the strongest baseline. The learned rewriter further generalizes to unseen tasks and environments without additional search, consistently outperforming a much larger teacher LLM at a fraction of the inference cost.