ASA: ツール呼び出しエージェントのための学習不要な表現エンジニアリング

要旨

LLMエージェントのドメイン特化ツール呼び出しへの適応は、進化するインターフェース下で顕著に脆弱性を示す。プロンプトとスキーマ設計は導入が容易だが、分布シフトや厳格なパーサーの下では脆く、継続的なパラメータ効率型ファインチューニングは信頼性を向上させるが、トレーニング・メンテナンスコストと潜在的な忘却を伴う。我々は「Lazy Agent」故障モードを特定した。ツール必要性が中間層活性化からほぼ完全にデコード可能であるにもかかわらず、モデルがツールモード移行に保守的となる「表現と行動の乖離」が観測される。本論文ではActivation Steering Adapter（ASA）を提案する。これはトレーニング不要の推論時制御器で、単発の中間層介入を実行し、ルータ条件付きステアリングベクトル混合とプローブ誘導符号付きゲートにより、真の意図を増幅し擬陽性トリガーを抑制する。Qwen2.5-1.5Bを用いたMTU-Benchでは、ASAが厳密なツール使用F1を0.18から0.50に改善し、偽陽性率を0.15から0.05に低減。約20KBのポータブル資産のみで重み更新を必要としない。

English

Adapting LLM agents to domain-specific tool calling remains notably brittle under evolving interfaces. Prompt and schema engineering is easy to deploy but often fragile under distribution shift and strict parsers, while continual parameter-efficient fine-tuning improves reliability at the cost of training, maintenance, and potential forgetting. We identify a critical Lazy Agent failure mode where tool necessity is nearly perfectly decodable from mid-layer activations, yet the model remains conservative in entering tool mode, revealing a representation-behavior gap. We propose Activation Steering Adapter (ASA), a training-free, inference-time controller that performs a single-shot mid-layer intervention and targets tool domains via a router-conditioned mixture of steering vectors with a probe-guided signed gate to amplify true intent while suppressing spurious triggers. On MTU-Bench with Qwen2.5-1.5B, ASA improves strict tool-use F1 from 0.18 to 0.50 while reducing the false positive rate from 0.15 to 0.05, using only about 20KB of portable assets and no weight updates.

ASA: ツール呼び出しエージェントのための学習不要な表現エンジニアリング

ASA: Training-Free Representation Engineering for Tool-Calling Agents

要旨

Support