CRANE:受限制的LLM生成推理
CRANE: Reasoning with constrained LLM generation
February 13, 2025
作者: Debangshu Banerjee, Tarun Suresh, Shubham Ugare, Sasa Misailovic, Gagandeep Singh
cs.AI
摘要
程式碼生成、符號數學推理和其他任務需要LLMs產生既符法又語義正確的輸出。受限LLM生成是一個有前途的方向,可以強制遵守正式語法,但先前的研究實證觀察到,對正式約束的嚴格執行通常會降低LLMs的推理能力。在這項工作中,我們首先提供了一個理論解釋,解釋為什麼將LLM輸出限制在只允許語法上有效的最終答案的非常嚴格語法會降低模型的推理能力。其次,我們展示了通過精心設計附加規則來擴充輸出語法,始終可以保留LLM的推理能力,同時確保其輸出的語法和語義正確。基於這些理論見解,我們提出了一種推理增強的受限解碼算法CRANE,有效平衡了受限生成的正確性和非受限生成的靈活性。在多個開源LLMs和基準測試上的實驗表明,CRANE明顯優於最先進的受限解碼策略和標準的非受限解碼,在具有挑戰性的符號推理基準測試GSM-symbolic和FOLIO上,準確度比基準線提高了高達10個百分點。
English
Code generation, symbolic math reasoning, and other tasks require LLMs to
produce outputs that are both syntactically and semantically correct.
Constrained LLM generation is a promising direction to enforce adherence to
formal grammar, but prior works have empirically observed that strict
enforcement of formal constraints often diminishes the reasoning capabilities
of LLMs. In this work, we first provide a theoretical explanation for why
constraining LLM outputs to very restrictive grammars that only allow
syntactically valid final answers reduces the reasoning capabilities of the
model. Second, we demonstrate that by augmenting the output grammar with
carefully designed additional rules, it is always possible to preserve the
reasoning capabilities of the LLM while ensuring syntactic and semantic
correctness in its outputs. Building on these theoretical insights, we propose
a reasoning-augmented constrained decoding algorithm, CRANE, which effectively
balances the correctness of constrained generation with the flexibility of
unconstrained generation. Experiments on multiple open-source LLMs and
benchmarks show that CRANE significantly outperforms both state-of-the-art
constrained decoding strategies and standard unconstrained decoding, showing up
to 10% points accuracy improvement over baselines on challenging symbolic
reasoning benchmarks GSM-symbolic and FOLIO.Summary
AI-Generated Summary