ChatPaper.aiChatPaper

技能并非一刀切:面向LLM智能体的模型感知技能对齐

Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents

May 29, 2026
作者: Jianxiang Yu, Jiapeng Zhu, Bochen Lin, Qier Cui, Zichen Ding, Xiang Li
cs.AI

摘要

大语言模型智能体越来越多地通过检索外部技能库——即在决策时检索的程序性指令——来提升在长周期交互任务中的表现。现有技能库通常被视为与模型无关,在不同能力与行为差异显著的主干模型上复用相同的技能表述。然而,我们在多个模型规模上的控制实验表明,技能的有效性与模型高度相关:对一个主干模型有益的技能可能损害另一个主干模型。基于这一观察,我们提出MASA(模型感知技能对齐)框架,该框架无需调整智能体权重即可将技能适配至每个目标主干模型。MASA分为两个阶段运行:(1)一个分层技能进化流水线,通过爬山法和基于UCB的树搜索,在环境反馈与模型能力档案引导下,迭代重写通用及任务特定技能;(2)一个轻量级、以模型为条件的技能重写器,基于进化轨迹训练,能够在单次前向传播中复现适配过程。在三个交互环境与四个主干模型上的实验表明,MASA始终取得最佳整体性能,相比最强基线提升高达25.8个点。训练好的重写器还能泛化至未见任务与环境,无需额外搜索,且以极低的推理成本持续超越更大的教师大语言模型。
English
LLM agents increasingly retrieve externally curated skills-procedural instructions retrieved at decision time-to improve performance on long-horizon interactive tasks. Existing skill libraries are typically treated as model-agnostic, reusing the same skill formulations across backbones with substantially different capacities and behaviors. However, our controlled experiments across multiple model scales show that skill effectiveness is strongly model-dependent: a skill that benefits one backbone can harm another. Motivated by this observation, we propose MASA Model-Aware Skill Alignment, a framework that adapts skills to each target backbone without modifying agent weights. MASA operates in two stages: (1) a hierarchical skill evolution pipeline that iteratively rewrites general and task-specific skills using hill climbing and UCB-driven tree search, guided by environment feedback and model capability profiles; and (2) a lightweight model-conditioned skill rewriter trained on evolution trajectories to reproduce the adaptation in a single forward pass. Experiments across three interactive environments and four backbones show that MASA consistently achieves the best overall performance, with gains of up to 25.8 points over the strongest baseline. The learned rewriter further generalizes to unseen tasks and environments without additional search, consistently outperforming a much larger teacher LLM at a fraction of the inference cost.