スキルは万能ではない：LLMエージェントのためのモデルを考慮したスキルアライメント

要旨

LLMエージェントは、長期的な対話型タスクのパフォーマンス向上のため、決定時に外部からキュレーションされたスキル（手順指示）を取得することが増えている。既存のスキルライブラリは通常、モデルに依存しないものとして扱われ、能力や動作が大きく異なるバックボーン間で同じスキル表現を再利用している。しかし、複数のモデル規模にわたる制御実験により、スキルの有効性はモデルに強く依存することが示された。あるバックボーンに有効なスキルが別のバックボーンには害を及ぼす可能性がある。この観察に動機づけられ、我々はMASA（Model-Aware Skill Alignment）を提案する。これはエージェントの重みを変更せずに各ターゲットバックボーンにスキルを適応させるフレームワークである。MASAは2段階で動作する。(1) 階層的なスキル進化パイプライン。これは、環境フィードバックとモデルの能力プロファイルに導かれ、山登り法およびUCB駆動の木探索を用いて一般的スキルとタスク固有スキルを反復的に書き換える。(2) 軽量なモデル条件付きスキル書き換え器。進化軌跡で訓練され、単一のフォワードパスで適応を再現する。3つの対話型環境と4つのバックボーンにわたる実験により、MASAが一貫して最良の全体的性能を達成し、最強のベースラインに比べて最大25.8ポイントの向上を示した。学習された書き換え器は、追加の探索なしに未見のタスクや環境にさらに一般化し、はるかに大規模な教師LLMを一貫して上回り、推論コストはその一部で済む。

English

LLM agents increasingly retrieve externally curated skills-procedural instructions retrieved at decision time-to improve performance on long-horizon interactive tasks. Existing skill libraries are typically treated as model-agnostic, reusing the same skill formulations across backbones with substantially different capacities and behaviors. However, our controlled experiments across multiple model scales show that skill effectiveness is strongly model-dependent: a skill that benefits one backbone can harm another. Motivated by this observation, we propose MASA Model-Aware Skill Alignment, a framework that adapts skills to each target backbone without modifying agent weights. MASA operates in two stages: (1) a hierarchical skill evolution pipeline that iteratively rewrites general and task-specific skills using hill climbing and UCB-driven tree search, guided by environment feedback and model capability profiles; and (2) a lightweight model-conditioned skill rewriter trained on evolution trajectories to reproduce the adaptation in a single forward pass. Experiments across three interactive environments and four backbones show that MASA consistently achieves the best overall performance, with gains of up to 25.8 points over the strongest baseline. The learned rewriter further generalizes to unseen tasks and environments without additional search, consistently outperforming a much larger teacher LLM at a fraction of the inference cost.