VERGE：可验证大语言模型推理的形式化精炼与引导引擎

摘要

尽管大型语言模型（LLM）具备流畅的语法生成能力，但在高风险领域确保其逻辑正确性仍是一项根本性挑战。我们提出一种神经符号框架，将LLM与可满足性模理论（SMT）求解器相结合，通过迭代优化生成可验证的答案。该方法将LLM输出分解为原子主张，将其自动形式化为一级逻辑，并利用自动定理证明验证其逻辑一致性。我们引入三大核心创新：（1）通过形式语义等价性检查实现多模型共识，确保候选答案在逻辑层面的对齐，消除表面形式指标的语法偏差；（2）语义路由机制，将不同类型的主张导向相应验证策略——逻辑命题使用符号求解器，常识推理采用LLM集成验证；（3）基于最小修正子集（MCS）的精确逻辑错误定位技术，精确定位需要修订的主张子集，将二元失败信号转化为可操作的反馈。该框架根据逻辑状态对主张进行分类，并将多重验证信号聚合为带有方差惩罚的统一评分。系统利用结构化反馈持续迭代优化答案，直至满足接受标准或实现收敛。这种混合方法在可行处提供形式化保证，在其他场景采用共识验证，从而推动可信AI发展。实验表明，在GPT-OSS-120B模型上，VERGE框架在一系列推理基准测试中的收敛性能较单次推理方法平均提升18.7%。

English

Despite the syntactic fluency of Large Language Models (LLMs), ensuring their logical correctness in high-stakes domains remains a fundamental challenge. We present a neurosymbolic framework that combines LLMs with SMT solvers to produce verification-guided answers through iterative refinement. Our approach decomposes LLM outputs into atomic claims, autoformalizes them into first-order logic, and verifies their logical consistency using automated theorem proving. We introduce three key innovations: (1) multi-model consensus via formal semantic equivalence checking to ensure logic-level alignment between candidates, eliminating the syntactic bias of surface-form metrics, (2) semantic routing that directs different claim types to appropriate verification strategies: symbolic solvers for logical claims and LLM ensembles for commonsense reasoning, and (3) precise logical error localization via Minimal Correction Subsets (MCS), which pinpoint the exact subset of claims to revise, transforming binary failure signals into actionable feedback. Our framework classifies claims by their logical status and aggregates multiple verification signals into a unified score with variance-based penalty. The system iteratively refines answers using structured feedback until acceptance criteria are met or convergence is achieved. This hybrid approach delivers formal guarantees where possible and consensus verification elsewhere, advancing trustworthy AI. With the GPT-OSS-120B model, VERGE demonstrates an average performance uplift of 18.7% at convergence across a set of reasoning benchmarks compared to single-pass approaches.