기술은 만능이 아니다: LLM 에이전트를 위한 모델 인식 기술 정렬

초록

LLM 에이전트는 점점 더 의사 결정 시점에 외부에서 큐레이션된 스킬(절차적 지시)을 검색하여 장기적인 상호작용 과제의 성능을 향상시키고 있다. 기존 스킬 라이브러리는 일반적으로 모델에 구애받지 않는 것으로 간주되어, 성능과 행동이 크게 다른 백본들 간에 동일한 스킬 공식을 재사용한다. 그러나 여러 모델 규모에 걸친 통제 실험 결과, 스킬의 효과성은 강하게 모델 의존적임이 드러났다. 즉, 한 백본에 유익한 스킬이 다른 백본에는 해를 끼칠 수 있다. 이러한 관찰에 기반하여, 본 연구는 에이전트 가중치를 수정하지 않고 각 대상 백본에 스킬을 적응시키는 프레임워크인 MASA(Model-Aware Skill Alignment)를 제안한다. MASA는 두 단계로 작동한다: (1) 환경 피드백과 모델 능력 프로파일에 의해 안내되는 언덕 오르기 및 UCB 기반 트리 탐색을 사용하여 일반 스킬과 과제별 스킬을 반복적으로 재작성하는 계층적 스킬 진화 파이프라인, (2) 진화 궤적에 대해 훈련되어 단일 순전파로 적응을 재현하는 경량 모델 조건부 스킬 재작성기. 세 가지 상호작용 환경과 네 가지 백본에 걸친 실험 결과, MASA가 일관되게 최고의 전반적 성능을 달성하며, 가장 강력한 기준선 대비 최대 25.8 포인트의 향상을 보였다. 학습된 재작성기는 추가 탐색 없이도 보지 못한 과제와 환경에 일반화되어, 훨씬 작은 추론 비용으로 더 큰 교사 LLM을 지속적으로 능가한다.

English

LLM agents increasingly retrieve externally curated skills-procedural instructions retrieved at decision time-to improve performance on long-horizon interactive tasks. Existing skill libraries are typically treated as model-agnostic, reusing the same skill formulations across backbones with substantially different capacities and behaviors. However, our controlled experiments across multiple model scales show that skill effectiveness is strongly model-dependent: a skill that benefits one backbone can harm another. Motivated by this observation, we propose MASA Model-Aware Skill Alignment, a framework that adapts skills to each target backbone without modifying agent weights. MASA operates in two stages: (1) a hierarchical skill evolution pipeline that iteratively rewrites general and task-specific skills using hill climbing and UCB-driven tree search, guided by environment feedback and model capability profiles; and (2) a lightweight model-conditioned skill rewriter trained on evolution trajectories to reproduce the adaptation in a single forward pass. Experiments across three interactive environments and four backbones show that MASA consistently achieves the best overall performance, with gains of up to 25.8 points over the strongest baseline. The learned rewriter further generalizes to unseen tasks and environments without additional search, consistently outperforming a much larger teacher LLM at a fraction of the inference cost.