ChatPaper.aiChatPaper

BEAVER:一種高效的確定性大型語言模型驗證器

BEAVER: An Efficient Deterministic LLM Verifier

December 5, 2025
作者: Tarun Suresh, Nalin Wadhwa, Debangshu Banerjee, Gagandeep Singh
cs.AI

摘要

隨著大型語言模型從研究原型轉向生產系統,從業者往往需要可靠的方法來驗證模型輸出是否滿足特定約束。雖然基於抽樣的估計能提供模型行為的直觀認知,但無法給出嚴格的保證。我們提出BEAVER——首個能為LLM約束滿足度計算確定性、嚴謹機率界限的實用框架。針對任何前綴封閉的語義約束,BEAVER通過創新型的詞元字典樹與邊界資料結構系統性探索生成空間,在每次迭代中維持可證明嚴謹的界限。我們將驗證問題形式化,證明方法的嚴謹性,並在多個前沿LLM上針對正確性驗證、隱私驗證及安全程式碼生成任務評估BEAVER。在相同計算資源下,BEAVER相比基準方法獲得6至8倍更緊密的機率界限,並識別出3至4倍更多高風險實例,實現了鬆散界限或經驗評估無法達成的精確特徵描繪與風險量化。
English
As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints. While sampling-based estimates provide an intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM constraint satisfaction. Given any prefix-closed semantic constraint, BEAVER systematically explores the generation space using novel token trie and frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on correctness verification, privacy verification and secure code generation tasks across multiple state of the art LLMs. BEAVER achieves 6 to 8 times tighter probability bounds and identifies 3 to 4 times more high risk instances compared to baseline methods under identical computational budgets, enabling precise characterization and risk assessment that loose bounds or empirical evaluation cannot provide.
PDF131December 13, 2025