开放权重大型语言模型中的约束代价:结构化输出约束下工具调用抑制的实证研究
Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints
June 24, 2026
作者: Fangzheng Li, Aimin Zhang, Chen Lv
cs.AI
摘要
工具调用与结构化输出是现代Agent系统的两项核心能力,然而在联合部署条件下两者间的交互机制尚缺乏充分理解。本文报告了在生产级Agent系统中观察到的可复现现象:当工具调用与JSON Schema约束同时启用时,多个开放权重模型在保持高模式合规性的前提下停止调用工具。我们将此行为称为工具抑制。通过跨多个模型系列与部署环境的受控实验,我们在联合约束条件下稳定复现了工具抑制现象,而单独评估工具执行与模式合规性时,两者均保持正常功能。进一步分析表明,JSON Schema约束被编译为基于语法的令牌掩码,导致工具调用令牌在解码过程中变为不可达状态,这为观察到的行为提供了实现层面的解释。为阐释该现象,我们提出约束优先级反转假说,该假说认为在多约束条件下,模式满足可能主导动作选择行为。需明确的是,CPI假说是基于观测证据的行为学解释,而非经证实的内部机制。为缓解此问题,我们提出透明双遍执行策略——一种通过解耦工具执行与模式约束响应生成的推理时方案。实验表明,该方法可在无需模型重训练的前提下恢复工具调用能力,同时保持结构化输出保证。这些发现提示,单独评估工具使用与结构化输出可能忽视生产级Agent系统中的关键可靠性问题。代码、数据及相关文档将发布于 https://github.com/Fzsama/Constrain-Tax-26-06.git。
English
Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their interaction under joint deployment conditions remains insufficiently understood. This paper reports a reproducible phenomenon observed in a production Agent system: when Tool Calling and JSON Schema constraints are simultaneously enabled, multiple open-weight models cease invoking tools despite maintaining high schema compliance. We refer to this behavior as Tool Suppression. Through controlled experiments across multiple model families and deployment settings, we consistently reproduce Tool Suppression under joint constraints, while tool execution and schema compliance remain functional when evaluated independently. Further analysis reveals that JSON Schema constraints are compiled into grammar-based token masks, causing tool-call tokens to become unreachable during decoding. This provides an implementation-level explanation for the observed behavior. To interpret the phenomenon, we formulate the Constraint Priority Inversion (CPI) hypothesis, which suggests that schema satisfaction may dominate action-selection behavior under multiple simultaneous constraints. We present CPI as a behavioral hypothesis consistent with the observed evidence rather than a verified internal mechanism. To mitigate the problem, we propose Transparent Two-Pass Execution, an inference-time strategy that decouples tool execution from schema-constrained response generation. Experimental results show that this approach restores tool invocation while preserving structured output guarantees without requiring model retraining. These findings suggest that evaluating tool use and structured output separately may overlook important reliability issues in production Agent systems. Code, data, and docs will be released at https://github.com/Fzsama/Constrain-Tax-26-06.git.