オープンウェイトLLMにおける制約税：構造化出力制約下でのツール呼び出し抑制に関する実証研究

要旨

ツール呼び出しと構造化出力は、現代のエージェントシステムにおける2つの中核的な機能であるが、それらが同時にデプロイされた際の相互作用については、いまだ十分に理解されていない。本稿では、本番環境のエージェントシステムにおいて観測された再現可能な現象を報告する。すなわち、ツール呼び出しとJSONスキーマ制約が同時に有効化されると、複数のオープンウェイトモデルが、高いスキーマ準拠性を維持しながらも、ツールの呼び出しを停止するというものである。我々はこの挙動を「ツール抑制」と呼ぶ。複数のモデルファミリーおよびデプロイ環境での制御実験を通じて、このツール抑制が複合制約下で一貫して再現される一方、独立に評価した場合にはツール実行およびスキーマ準拠は機能し続けることを確認した。さらに詳細な分析により、JSONスキーマ制約は文法ベースのトークンマスクにコンパイルされ、その結果、デコード中にツール呼び出しトークンが到達不能になることが明らかになった。これは観測された挙動に対する実装レベルの説明を提供する。この現象を解釈するため、我々は「制約優先度反転」仮説を定式化する。これは、複数の同時制約下ではスキーマ充足が行動選択行動を支配する可能性を示唆するものである。我々はCPIを、検証された内部メカニズムではなく、観測された証拠と整合する行動仮説として提示する。この問題を軽減するために、我々は「透明な2パス実行」という推論時戦略を提案する。これはツール実行とスキーマ制約付き応答生成を分離するものである。実験結果は、このアプローチがモデルの再学習を必要とせずに、構造化出力の保証を維持しながらツール呼び出しを回復することを示している。これらの知見は、ツール使用と構造化出力を別々に評価することは、本番エージェントシステムにおける重要な信頼性の問題を見落とす可能性があることを示唆している。コード、データ、ドキュメントはhttps://github.com/Fzsama/Constrain-Tax-26-06.gitで公開予定である。

English

Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their interaction under joint deployment conditions remains insufficiently understood. This paper reports a reproducible phenomenon observed in a production Agent system: when Tool Calling and JSON Schema constraints are simultaneously enabled, multiple open-weight models cease invoking tools despite maintaining high schema compliance. We refer to this behavior as Tool Suppression. Through controlled experiments across multiple model families and deployment settings, we consistently reproduce Tool Suppression under joint constraints, while tool execution and schema compliance remain functional when evaluated independently. Further analysis reveals that JSON Schema constraints are compiled into grammar-based token masks, causing tool-call tokens to become unreachable during decoding. This provides an implementation-level explanation for the observed behavior. To interpret the phenomenon, we formulate the Constraint Priority Inversion (CPI) hypothesis, which suggests that schema satisfaction may dominate action-selection behavior under multiple simultaneous constraints. We present CPI as a behavioral hypothesis consistent with the observed evidence rather than a verified internal mechanism. To mitigate the problem, we propose Transparent Two-Pass Execution, an inference-time strategy that decouples tool execution from schema-constrained response generation. Experimental results show that this approach restores tool invocation while preserving structured output guarantees without requiring model retraining. These findings suggest that evaluating tool use and structured output separately may overlook important reliability issues in production Agent systems. Code, data, and docs will be released at https://github.com/Fzsama/Constrain-Tax-26-06.git.