ASA：面向工具調用代理的無訓練表徵工程

摘要

大型語言模型代理在適應領域特定工具調用時，面對持續演進的介面仍表現出明顯的脆弱性。提示與架構工程雖易於部署，但在分佈偏移和嚴格解析器下往往不夠穩健；而持續的參數高效微調雖能提升可靠性，卻需付出訓練成本、維護代價及潛在的遺忘風險。我們發現一種關鍵的「惰性代理」失效模式：儘管從中間層激活值能近乎完美解碼工具使用必要性，模型仍保守地避免進入工具調用模式，揭示出表徵與行為間的落差。為此，我們提出激活導向適配器（ASA），這款免訓練的推理時控制器透過單次中間層干預，以路由器調控的導向向量混合體鎖定工具領域，並結合探針引導的符號門控機制，在抑制虛假觸發的同時放大真實意圖。在Qwen2.5-1.5B模型上的MTU-Bench測試顯示，ASA僅需約20KB可移植資源且無需權重更新，即將嚴格工具使用的F1分數從0.18提升至0.50，同時將誤報率從0.15降至0.05。

English

Adapting LLM agents to domain-specific tool calling remains notably brittle under evolving interfaces. Prompt and schema engineering is easy to deploy but often fragile under distribution shift and strict parsers, while continual parameter-efficient fine-tuning improves reliability at the cost of training, maintenance, and potential forgetting. We identify a critical Lazy Agent failure mode where tool necessity is nearly perfectly decodable from mid-layer activations, yet the model remains conservative in entering tool mode, revealing a representation-behavior gap. We propose Activation Steering Adapter (ASA), a training-free, inference-time controller that performs a single-shot mid-layer intervention and targets tool domains via a router-conditioned mixture of steering vectors with a probe-guided signed gate to amplify true intent while suppressing spurious triggers. On MTU-Bench with Qwen2.5-1.5B, ASA improves strict tool-use F1 from 0.18 to 0.50 while reducing the false positive rate from 0.15 to 0.05, using only about 20KB of portable assets and no weight updates.

ASA：面向工具調用代理的無訓練表徵工程

ASA: Training-Free Representation Engineering for Tool-Calling Agents

摘要

Support