制約前の思考：大規模言語モデルのための統一デコードフレームワーク

要旨

自然生成は大規模言語モデル（LLM）に豊かな推論を伴う自由形式の応答を生成させる一方で、構造の欠如により出力の検証が困難となる。対照的に、制約付きデコーディングは標準化された形式を保証するが、生成過程の早期に制約を課すことで意図せず推論能力を制限し得る。我々は、自由形式の推論と構造化生成を単一の呼び出しで組み合わせたハイブリッド手法、すなわちIn-Writingを提案する。本モデルはまず制約のない推論を行い、トリガートークンが生成された後にのみ構造化デコーディングを適用することで、推論とフォーマットを明示的に分離する。我々のトリガートークン戦略により、制約付きデコーディングが進行中の推論を中断する障害モードである早期トリガリングをほぼ完全に解消できることを立証する。分類および推論タスクを網羅する多様なデータセットでの評価は、本手法が最先端手法を上回り、自然生成と比較して最大27%の精度向上を達成することを示している。コードは以下で入手可能である：https://github.com/Nokia-Bell-Labs/InWriting

English

Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combines free-form reasoning and structured generation in a single call. The model first performs unconstrained reasoning and only applies structured decoding after a trigger token is generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicate premature triggering, a failure mode in which constrained decoding interrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.