VERGE: Formele Verfijnings- en Begeleidingsmotor voor Verifieerbare LLM-redenering

Samenvatting

Ondanks de syntactische vlotheid van grote taalmmodellen (LLM's) blijft het waarborgen van hun logische correctheid in hoogrisicodomeinen een fundamentele uitdaging. Wij presenteren een neurosymbolisch raamwerk dat LLM's combineert met SMT-oplossers om verificatie-gestuurde antwoorden te produceren via iteratieve verfijning. Onze aanpak decomposeert LLM-outputs in atomische beweringen, formaliseert deze automatisch naar eerstorde-logica en verifieert hun logische consistentie met geautomatiseerde bewijsvoering. Wij introduceren drie belangrijke innovaties: (1) multi-model consensus via formele semantische equivalentiecontrole om logische afstemming tussen kandidaatantwoorden te garanderen, waardoor de syntactische bias van oppervlaktevorm-metrieken wordt geëlimineerd, (2) semantisch routeren dat verschillende beweringstypen naar geschikte verificatiestrategieën leidt: symbolische oplossers voor logische beweringen en LLM-ensembles voor gezond verstand-redenering, en (3) precieze lokalisatie van logische fouten via Minimal Correction Subsets (MCS), die de exacte subset van te reviseren beweringen identificeert, waardoor binaire foutsignalen worden omgezet in actiegerichte feedback. Ons raamwerk classificeert beweringen op basis van hun logische status en aggregeert meerdere verificatiesignalen tot een uniforme score met variantie-gebaseerde penalisatie. Het systeem verfijnt antwoorden iteratief met gestructureerde feedback totdat acceptatiecriteria worden bereikt of convergentie wordt gerealiseerd. Deze hybride aanpak biedt formele garanties waar mogelijk en consensusverificatie elders, wat bijdraagt aan betrouwbare AI. Met het GPT-OSS-120B-model demonstreert VERGE een gemiddelde prestatieverbetering van 18,7% bij convergentie over een reeks redeneerbenchmarks in vergelijking met single-pass benaderingen.

English

Despite the syntactic fluency of Large Language Models (LLMs), ensuring their logical correctness in high-stakes domains remains a fundamental challenge. We present a neurosymbolic framework that combines LLMs with SMT solvers to produce verification-guided answers through iterative refinement. Our approach decomposes LLM outputs into atomic claims, autoformalizes them into first-order logic, and verifies their logical consistency using automated theorem proving. We introduce three key innovations: (1) multi-model consensus via formal semantic equivalence checking to ensure logic-level alignment between candidates, eliminating the syntactic bias of surface-form metrics, (2) semantic routing that directs different claim types to appropriate verification strategies: symbolic solvers for logical claims and LLM ensembles for commonsense reasoning, and (3) precise logical error localization via Minimal Correction Subsets (MCS), which pinpoint the exact subset of claims to revise, transforming binary failure signals into actionable feedback. Our framework classifies claims by their logical status and aggregates multiple verification signals into a unified score with variance-based penalty. The system iteratively refines answers using structured feedback until acceptance criteria are met or convergence is achieved. This hybrid approach delivers formal guarantees where possible and consensus verification elsewhere, advancing trustworthy AI. With the GPT-OSS-120B model, VERGE demonstrates an average performance uplift of 18.7% at convergence across a set of reasoning benchmarks compared to single-pass approaches.

VERGE: Formele Verfijnings- en Begeleidingsmotor voor Verifieerbare LLM-redenering

VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

Samenvatting

Support