先思考后约束：大型语言模型的统一解码框架

摘要

自然生成允许大型语言模型（LLMs）产生自由形式的响应，具备丰富的推理过程，但缺乏结构使得输出难以验证。相反，约束解码确保了标准化格式，却可能因在生成过程中过早施加约束而无意中限制推理能力。我们提出一种混合方法，称为In-Writing，它在单次调用中结合了自由形式推理与结构化生成。模型首先进行无约束推理，仅在生成触发令牌后才应用结构化解码，从而明确地将推理与格式化分离。我们证明，我们的触发令牌策略能够几乎完全消除过早触发——即约束解码中断正在进行推理的失效模式。在涵盖分类和推理任务的多种数据集上的评估表明，我们的方法相较于自然生成，在准确率上最高提升了27%，超越了当前最先进的水平。我们的代码可在以下地址获取：https://github.com/Nokia-Bell-Labs/InWriting。

English

Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combines free-form reasoning and structured generation in a single call. The model first performs unconstrained reasoning and only applies structured decoding after a trigger token is generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicate premature triggering, a failure mode in which constrained decoding interrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.