ChatPaper.aiChatPaper

VerifiAgent:語言模型推理中的統一驗證代理

VerifiAgent: a Unified Verification Agent in Language Model Reasoning

April 1, 2025
作者: Jiuzhou Han, Wray Buntine, Ehsan Shareghi
cs.AI

摘要

大型語言模型展現出卓越的推理能力,但往往會產生不可靠或錯誤的回應。現有的驗證方法通常針對特定模型或受限於特定領域,需要大量計算資源,且缺乏跨多樣推理任務的可擴展性。為解決這些限制,我們提出了VerifiAgent,這是一個統一的驗證代理,整合了兩個層次的驗證:元驗證,用於評估模型回應的完整性和一致性;以及基於工具的適應性驗證,其中VerifiAgent根據推理類型(包括數學、邏輯或常識推理)自主選擇合適的驗證工具。這種適應性方法確保了在不同驗證場景下的效率和魯棒性。實驗結果顯示,VerifiAgent在所有推理任務中均優於基線驗證方法(如演繹驗證器、反向驗證器)。此外,它還能通過利用驗證結果的反饋進一步提升推理準確性。VerifiAgent也能有效應用於推理擴展,在數學推理領域,與現有的過程獎勵模型相比,能以更少的生成樣本和成本取得更好的結果。程式碼可在https://github.com/Jiuzhouh/VerifiAgent 取得。
English
Large language models demonstrate remarkable reasoning capabilities but often produce unreliable or incorrect responses. Existing verification methods are typically model-specific or domain-restricted, requiring significant computational resources and lacking scalability across diverse reasoning tasks. To address these limitations, we propose VerifiAgent, a unified verification agent that integrates two levels of verification: meta-verification, which assesses completeness and consistency in model responses, and tool-based adaptive verification, where VerifiAgent autonomously selects appropriate verification tools based on the reasoning type, including mathematical, logical, or commonsense reasoning. This adaptive approach ensures both efficiency and robustness across different verification scenarios. Experimental results show that VerifiAgent outperforms baseline verification methods (e.g., deductive verifier, backward verifier) among all reasoning tasks. Additionally, it can further enhance reasoning accuracy by leveraging feedback from verification results. VerifiAgent can also be effectively applied to inference scaling, achieving better results with fewer generated samples and costs compared to existing process reward models in the mathematical reasoning domain. Code is available at https://github.com/Jiuzhouh/VerifiAgent

Summary

AI-Generated Summary

PDF62April 3, 2025