ChatPaper.aiChatPaper

BEAVER:一种高效确定性大语言模型验证器

BEAVER: An Efficient Deterministic LLM Verifier

December 5, 2025
作者: Tarun Suresh, Nalin Wadhwa, Debangshu Banerjee, Gagandeep Singh
cs.AI

摘要

随着大型语言模型从研究原型转向生产系统,从业者往往需要可靠的方法来验证模型输出是否满足既定约束。虽然基于采样的估计能提供模型行为的直观认知,但无法给出严格保证。我们提出BEAVER框架——首个可实际计算LLM约束满足度的确定性概率边界的方法。该框架针对任意前缀封闭的语义约束,通过创新的词汇树和边界数据结构系统性地探索生成空间,并在每次迭代中保持可证明的严格边界。我们形式化验证问题,证明方法的可靠性,并在多个前沿LLM上对BEAVER进行正确性验证、隐私验证及安全代码生成任务的评估。在相同计算资源下,BEAVER获得的概率边界比基线方法精确6至8倍,高风险实例识别数量提升3至4倍,实现了松散边界或经验评估无法达成的精准特性描述与风险评估。
English
As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints. While sampling-based estimates provide an intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM constraint satisfaction. Given any prefix-closed semantic constraint, BEAVER systematically explores the generation space using novel token trie and frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on correctness verification, privacy verification and secure code generation tasks across multiple state of the art LLMs. BEAVER achieves 6 to 8 times tighter probability bounds and identifies 3 to 4 times more high risk instances compared to baseline methods under identical computational budgets, enabling precise characterization and risk assessment that loose bounds or empirical evaluation cannot provide.
PDF131December 13, 2025