VERGE:可驗證大型語言模型推理的形式化精煉與引導引擎
VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning
January 27, 2026
作者: Vikash Singh, Darion Cassel, Nathaniel Weir, Nick Feng, Sam Bayless
cs.AI
摘要
儘管大型語言模型(LLM)具備流暢的語法生成能力,但在高風險領域確保其邏輯正確性仍是根本性挑戰。我們提出一種神經符號框架,將LLM與可滿足性模理論(SMT)求解器相結合,透過迭代優化產生可驗證導向的答案。此方法將LLM輸出分解為原子主張,將其自動形式化為一階邏輯,並使用自動定理證明驗證其邏輯一致性。我們引入三項關鍵創新:(1)透過形式語義等價性檢查實現多模型共識,確保候選答案在邏輯層面的一致性,消除表面形式指標的語法偏差;(2)語義路由機制,將不同類型的主導引導至合適的驗證策略——邏輯主張使用符號求解器,常識推理則採用LLM集成驗證;(3)透過最小修正子集(MCS)實現精確的邏輯錯誤定位,鎖定需修正的具體主張子集,將二元失敗信號轉化為可操作的反饋。我們的框架根據邏輯狀態對主張進行分類,並將多重驗證信號聚合為帶有方差懲罰的統一評分。系統利用結構化反饋迭代優化答案,直至達到接受標準或實現收斂。這種混合方法在可能處提供形式化保證,在其他場景採用共識驗證,推動可信AI的發展。在GPT-OSS-120B模型上,VERGE在一系列推理基準測試中,相比單次推理方法在收斂時實現平均18.7%的性能提升。
English
Despite the syntactic fluency of Large Language Models (LLMs), ensuring their logical correctness in high-stakes domains remains a fundamental challenge. We present a neurosymbolic framework that combines LLMs with SMT solvers to produce verification-guided answers through iterative refinement. Our approach decomposes LLM outputs into atomic claims, autoformalizes them into first-order logic, and verifies their logical consistency using automated theorem proving. We introduce three key innovations: (1) multi-model consensus via formal semantic equivalence checking to ensure logic-level alignment between candidates, eliminating the syntactic bias of surface-form metrics, (2) semantic routing that directs different claim types to appropriate verification strategies: symbolic solvers for logical claims and LLM ensembles for commonsense reasoning, and (3) precise logical error localization via Minimal Correction Subsets (MCS), which pinpoint the exact subset of claims to revise, transforming binary failure signals into actionable feedback. Our framework classifies claims by their logical status and aggregates multiple verification signals into a unified score with variance-based penalty. The system iteratively refines answers using structured feedback until acceptance criteria are met or convergence is achieved. This hybrid approach delivers formal guarantees where possible and consensus verification elsewhere, advancing trustworthy AI. With the GPT-OSS-120B model, VERGE demonstrates an average performance uplift of 18.7% at convergence across a set of reasoning benchmarks compared to single-pass approaches.