ASA:面向工具调用智能体的免训练表征工程
ASA: Training-Free Representation Engineering for Tool-Calling Agents
February 4, 2026
作者: Youjin Wang, Run Zhou, Rong Fu, Shuaishuai Cao, Hongwei Zeng, Jiaxuan Lu, Sicheng Fan, Jiaqiao Zhao, Liangming Pan
cs.AI
摘要
在大模型智能体适应领域特定工具调用方面,演化中的接口仍存在显著脆弱性。提示与模式工程虽易于部署,但在分布偏移和严格解析器下往往表现脆弱;而持续的参数高效微调虽能提升可靠性,却需承担训练维护成本及潜在遗忘风险。我们发现了一种关键的"惰性智能体"失效模式:尽管从中层激活状态能近乎完美解码工具使用必要性,模型仍保守地回避进入工具模式,这揭示了表征与行为之间的脱节。我们提出激活引导适配器(ASA),这是一种免训练、推理时控制器,通过单次中层干预,采用路由器调制的引导向量混合方案与探针指导的符号门控机制,在放大真实意图的同时抑制误触发。在Qwen2.5-1.5B模型上的MTU-Bench测试表明,ASA仅需约20KB可移植资源且无需权重更新,就将严格工具使用F1值从0.18提升至0.50,同时将误报率从0.15降至0.05。
English
Adapting LLM agents to domain-specific tool calling remains notably brittle under evolving interfaces. Prompt and schema engineering is easy to deploy but often fragile under distribution shift and strict parsers, while continual parameter-efficient fine-tuning improves reliability at the cost of training, maintenance, and potential forgetting. We identify a critical Lazy Agent failure mode where tool necessity is nearly perfectly decodable from mid-layer activations, yet the model remains conservative in entering tool mode, revealing a representation-behavior gap. We propose Activation Steering Adapter (ASA), a training-free, inference-time controller that performs a single-shot mid-layer intervention and targets tool domains via a router-conditioned mixture of steering vectors with a probe-guided signed gate to amplify true intent while suppressing spurious triggers. On MTU-Bench with Qwen2.5-1.5B, ASA improves strict tool-use F1 from 0.18 to 0.50 while reducing the false positive rate from 0.15 to 0.05, using only about 20KB of portable assets and no weight updates.