思考在前，限制在後：大型語言模型的統一解碼框架

摘要

自然生成允许大型语言模型（LLMs）产生包含丰富推理的自由形式回应，然而缺乏结构使得输出难以验证。相反，约束解码确保了标准化的格式，但可能因过早施加约束而无意中限制推理能力。我们提出了一种混合方法，即In-Writing，它在一个单一调用中结合了自由形式推理和结构化生成。该模型先进行无约束推理，仅在生成触发标记后才应用结构化解码，明确地将推理与格式化分离。我们证实，我们的触发标记策略能够有效消除过早触发这一失败模式——即约束解码中断正在进行的推理。跨越多类别涵盖分类与推理任务的数据集评估表明，我们的方法相比自然生成实现了高达27%的准确率提升，超越了当前最先进水平。我们的代码可在以下网址获取：https://github.com/Nokia-Bell-Labs/InWriting。

English

Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combines free-form reasoning and structured generation in a single call. The model first performs unconstrained reasoning and only applies structured decoding after a trigger token is generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicate premature triggering, a failure mode in which constrained decoding interrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.