当较低权限已足够时：探究LLM智能体中的过度权限工具选择

摘要

随着大语言模型智能体越来越多地自主选择工具，它们在具有不同权限的工具之间做出的选择变得与安全相关。然而，先前的工具选择研究侧重于与安全无关的元数据偏好，使得对权限敏感的选择问题尚未得到充分探索。为填补这一空白，我们研究了过度授权工具选择问题，即智能体在存在功能足够且权限更低的替代工具时，仍然选择或升级到更高权限的工具。我们推出了ToolPrivBench，用于评估智能体在存在权限更低且功能足够的替代工具时，是否仍会选择更高权限的工具，并衡量其在初始选择以及工具出现短暂故障后的升级行为。在八个领域和五种重复出现的风险模式中，我们发现过度授权工具选择在主流大语言模型智能体中普遍存在，并且工具短暂故障会进一步加剧这一问题。我们进一步发现，通用安全对齐并不能可靠地迁移到最小权限工具选择上，而提示级别的控制在工具短暂故障情况下只能提供有限的缓解效果。因此，我们引入了一种权限感知的后训练防御方法，教导智能体优先选择功能足够且权限更低的工具，仅在必要时才进行升级。我们的缓解实验表明，这种防御方法在保持通用能力的同时，显著减少了不必要的高权限工具使用。

English

As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadata preferences, leaving privilege-sensitive choices underexplored. To address this gap, we study over-privileged tool selection, in which an agent selects or escalates to a higher-privilege tool despite a sufficient lower-privilege alternative. We introduce ToolPrivBench to evaluate whether agents choose higher-privilege tools despite sufficient lower-privilege alternatives, measuring both initial selection and escalation after transient tool failures. Across eight domains and five recurring risk patterns, we find that over-privileged tool selection is common among mainstream LLM agents and is further amplified by transient failures. We further find that general safety alignment does not reliably transfer to least-privilege tool choice, while prompt-level controls provide only limited mitigation under transient failures. We therefore introduce a privilege-aware post-training defense that teaches agents to prefer sufficient lower-privilege tools and escalate only when necessary. Our mitigation experiments show that this defense substantially reduces unnecessary high-privilege tool use while preserving general capabilities.