CRANE：受限制的LLM生成推理

摘要

程式碼生成、符號數學推理和其他任務需要LLMs產生既符法又語義正確的輸出。受限LLM生成是一個有前途的方向，可以強制遵守正式語法，但先前的研究實證觀察到，對正式約束的嚴格執行通常會降低LLMs的推理能力。在這項工作中，我們首先提供了一個理論解釋，解釋為什麼將LLM輸出限制在只允許語法上有效的最終答案的非常嚴格語法會降低模型的推理能力。其次，我們展示了通過精心設計附加規則來擴充輸出語法，始終可以保留LLM的推理能力，同時確保其輸出的語法和語義正確。基於這些理論見解，我們提出了一種推理增強的受限解碼算法CRANE，有效平衡了受限生成的正確性和非受限生成的靈活性。在多個開源LLMs和基準測試上的實驗表明，CRANE明顯優於最先進的受限解碼策略和標準的非受限解碼，在具有挑戰性的符號推理基準測試GSM-symbolic和FOLIO上，準確度比基準線提高了高達10個百分點。

English

Code generation, symbolic math reasoning, and other tasks require LLMs to produce outputs that are both syntactically and semantically correct. Constrained LLM generation is a promising direction to enforce adherence to formal grammar, but prior works have empirically observed that strict enforcement of formal constraints often diminishes the reasoning capabilities of LLMs. In this work, we first provide a theoretical explanation for why constraining LLM outputs to very restrictive grammars that only allow syntactically valid final answers reduces the reasoning capabilities of the model. Second, we demonstrate that by augmenting the output grammar with carefully designed additional rules, it is always possible to preserve the reasoning capabilities of the LLM while ensuring syntactic and semantic correctness in its outputs. Building on these theoretical insights, we propose a reasoning-augmented constrained decoding algorithm, CRANE, which effectively balances the correctness of constrained generation with the flexibility of unconstrained generation. Experiments on multiple open-source LLMs and benchmarks show that CRANE significantly outperforms both state-of-the-art constrained decoding strategies and standard unconstrained decoding, showing up to 10% points accuracy improvement over baselines on challenging symbolic reasoning benchmarks GSM-symbolic and FOLIO.

CRANE：受限制的LLM生成推理

CRANE: Reasoning with constrained LLM generation

摘要

Support