ASA: 도구 호출 에이전트를 위한 학습 없는 표현 엔지지니어링

초록

도메인 특화 도구 호출에 대한 LLM 에이전트 적응은 진화하는 인터페이스 하에서 여전히 취약한 것으로 나타납니다. 프롬프트 및 스키마 엔지니어링은 배포가 쉽지만 분포 변화와 엄격한 파서 하에서 종종 불안정한 반면, 지속적인 파라미터 효율 미세 조정은 훈련, 유지보수 및 잠재적 망각이라는 비용을 치르고 신뢰성을 향상시킵니다. 우리는 중간층 활성화에서 도구 필요성을 거의 완벽하게 디코딩할 수 있음에도 모델이 도구 모드 진입에 보수적으로 접근하는 '게으른 에이전트' 실패 모드를 확인하여 표현-행동 간격을 드러냈습니다. 우리는 훈련이 필요 없는 추론 시점 제어기인 Activation Steering Adapter(ASA)를 제안합니다. ASA는 단일 샷 중간층 개입을 수행하며, 프로브 기반 부호 게이트를 통해 진짜 의도를 증폭하고 허위 트리거를 억제하는 라우터 조건부 스티어링 벡터 혼합을 통해 도구 도메인을 대상으로 합니다. Qwen2.5-1.5B 모델과 MTU-Bench에서 ASA는 약 20KB의 휴대용 자산만을 사용하고 가중치 업데이트 없이도 엄격한 도구 사용 F1 점수를 0.18에서 0.50으로 개선하고 위양성률을 0.15에서 0.05로 감소시켰습니다.

English

Adapting LLM agents to domain-specific tool calling remains notably brittle under evolving interfaces. Prompt and schema engineering is easy to deploy but often fragile under distribution shift and strict parsers, while continual parameter-efficient fine-tuning improves reliability at the cost of training, maintenance, and potential forgetting. We identify a critical Lazy Agent failure mode where tool necessity is nearly perfectly decodable from mid-layer activations, yet the model remains conservative in entering tool mode, revealing a representation-behavior gap. We propose Activation Steering Adapter (ASA), a training-free, inference-time controller that performs a single-shot mid-layer intervention and targets tool domains via a router-conditioned mixture of steering vectors with a probe-guided signed gate to amplify true intent while suppressing spurious triggers. On MTU-Bench with Qwen2.5-1.5B, ASA improves strict tool-use F1 from 0.18 to 0.50 while reducing the false positive rate from 0.15 to 0.05, using only about 20KB of portable assets and no weight updates.

ASA: 도구 호출 에이전트를 위한 학습 없는 표현 엔지지니어링

ASA: Training-Free Representation Engineering for Tool-Calling Agents

초록

Support