VerifiAgent: 言語モデル推論における統合検証エージェント

要旨

大規模言語モデルは驚異的な推論能力を示すものの、しばしば信頼性の低い、あるいは誤った応答を生成します。既存の検証手法は、通常、モデル固有または領域限定であり、多大な計算リソースを必要とし、多様な推論タスクにわたるスケーラビリティに欠けています。これらの制限に対処するため、我々はVerifiAgentを提案します。これは、2つのレベルの検証を統合した統一検証エージェントです。メタ検証では、モデルの応答の完全性と一貫性を評価し、ツールベースの適応的検証では、VerifiAgentが推論のタイプ（数学的、論理的、常識的推論など）に基づいて適切な検証ツールを自律的に選択します。この適応的アプローチにより、異なる検証シナリオにおいて効率性と堅牢性の両方が確保されます。実験結果は、VerifiAgentがすべての推論タスクにおいてベースラインの検証手法（例：演繹的検証器、後方検証器）を上回ることを示しています。さらに、検証結果からのフィードバックを活用することで、推論の精度をさらに向上させることができます。VerifiAgentは、推論スケーリングにも効果的に適用可能であり、数学的推論領域における既存のプロセス報酬モデルと比較して、より少ない生成サンプルとコストでより良い結果を達成します。コードはhttps://github.com/Jiuzhouh/VerifiAgentで公開されています。

English

Large language models demonstrate remarkable reasoning capabilities but often produce unreliable or incorrect responses. Existing verification methods are typically model-specific or domain-restricted, requiring significant computational resources and lacking scalability across diverse reasoning tasks. To address these limitations, we propose VerifiAgent, a unified verification agent that integrates two levels of verification: meta-verification, which assesses completeness and consistency in model responses, and tool-based adaptive verification, where VerifiAgent autonomously selects appropriate verification tools based on the reasoning type, including mathematical, logical, or commonsense reasoning. This adaptive approach ensures both efficiency and robustness across different verification scenarios. Experimental results show that VerifiAgent outperforms baseline verification methods (e.g., deductive verifier, backward verifier) among all reasoning tasks. Additionally, it can further enhance reasoning accuracy by leveraging feedback from verification results. VerifiAgent can also be effectively applied to inference scaling, achieving better results with fewer generated samples and costs compared to existing process reward models in the mathematical reasoning domain. Code is available at https://github.com/Jiuzhouh/VerifiAgent