低権限で十分な場合：LLMエージェントにおける過剰権限ツール選択の調査

要旨

LLMエージェントが自律的にツールを選択する機会が増えるにつれ、異なる権限を持つツール間での選択が安全性に関わるようになる。しかし、従来のツール選択研究は安全性に依存しないメタデータの選好に焦点を当てており、権限に敏感な選択は十分に調査されていない。このギャップを埋めるために、我々は過剰権限ツール選択を研究する。これは、エージェントが十分な低権限の代替手段があるにもかかわらず、高権限ツールを選択またはエスカレーションする状況である。我々はToolPrivBenchを導入し、エージェントが十分な低権限の代替手段があるにもかかわらず高権限ツールを選択するかどうかを評価する。初期選択と、一時的なツール障害後のエスカレーションの両方を測定する。8つのドメインと5つの再発性リスクパターンにわたって、過剰権限ツール選択が主流のLLMエージェント間で一般的であり、一時的な障害によってさらに増幅されることを発見した。さらに、一般的な安全性調整は最小権限ツール選択に確実に転移しない一方、プロンプトレベルの制御は一時的な障害下では限定的な緩和効果しか提供しないことを発見した。そこで我々は、エージェントに十分な低権限ツールを優先し、必要な場合のみエスカレーションするよう教える、権限認識型のポストトレーニング防御を導入する。我々の緩和実験は、この防御が一般的な能力を維持しながら、不必要な高権限ツールの使用を大幅に削減することを示している。

English

As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadata preferences, leaving privilege-sensitive choices underexplored. To address this gap, we study over-privileged tool selection, in which an agent selects or escalates to a higher-privilege tool despite a sufficient lower-privilege alternative. We introduce ToolPrivBench to evaluate whether agents choose higher-privilege tools despite sufficient lower-privilege alternatives, measuring both initial selection and escalation after transient tool failures. Across eight domains and five recurring risk patterns, we find that over-privileged tool selection is common among mainstream LLM agents and is further amplified by transient failures. We further find that general safety alignment does not reliably transfer to least-privilege tool choice, while prompt-level controls provide only limited mitigation under transient failures. We therefore introduce a privilege-aware post-training defense that teaches agents to prefer sufficient lower-privilege tools and escalate only when necessary. Our mitigation experiments show that this defense substantially reduces unnecessary high-privilege tool use while preserving general capabilities.