FLARE: 忠実な論理支援推論と探索

要旨

近年の大規模言語モデル（LLMs）に基づく現代の質問応答（QA）および推論アプローチでは、Chain-of-Thought（CoT）などのプロンプティング技術が一般的に使用され、生成物はより詳細な探索と問題空間および範囲に対する推論を持つと仮定されています。しかし、このような手法は、モデルによって生成された中間推論の出力が忠実であることに苦労しています。一方、Faithful CoT（F-CoT）などの神経記号論的手法は、LLMsと外部の記号ソルバーを組み合わせることを提案しています。このようなアプローチは高い忠実度を誇りますが、通常はコード生成用にトレーニングされたモデルが必要であり、曖昧または厳密に形式化するのが難しいタスクに苦労します。私たちは、Faithful Logic-Aided Reasoning and Exploration（\ours）という、タスクの分解を使用して問題空間を横断するための新しい解釈可能なアプローチを紹介します。LLMを使用してソリューションを計画し、論理プログラミングコードを使用してクエリを事実と述語にソフト形式化し、そのコードの実行を定義された空間上での徹底的なマルチホップ検索を使用してシミュレートします。私たちの手法により、生成されたコードに対する推論プロセスの忠実度を計算し、外部ソルバーに依存せずにマルチホップ検索のステップを分析することが可能です。私たちの手法は、9つの多様な推論ベンチマークのうち7つでSOTAの結果を達成しています。また、モデルの忠実度が全体的なパフォーマンスと正の相関関係にあることを示し、さらに{\ours}が、マルチホップ検索中の最適な推論を行い、正しい答えに至るために十分なかつ重要な要因を特定することを可能にすることも示しています。

English

Modern Question Answering (QA) and Reasoning approaches based on Large Language Models (LLMs) commonly use prompting techniques, such as Chain-of-Thought (CoT), assuming the resulting generation will have a more granular exploration and reasoning over the question space and scope. However, such methods struggle with generating outputs that are faithful to the intermediate chain of reasoning produced by the model. On the other end of the spectrum, neuro-symbolic methods such as Faithful CoT (F-CoT) propose to combine LLMs with external symbolic solvers. While such approaches boast a high degree of faithfulness, they usually require a model trained for code generation and struggle with tasks that are ambiguous or hard to formalise strictly. We introduce Faithful Logic-Aided Reasoning and Exploration (\ours), a novel interpretable approach for traversing the problem space using task decompositions. We use the LLM to plan a solution, soft-formalise the query into facts and predicates using a logic programming code and simulate that code execution using an exhaustive multi-hop search over the defined space. Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers. Our methods achieve SOTA results on 7 out of 9 diverse reasoning benchmarks. We also show that model faithfulness positively correlates with overall performance and further demonstrate that {\ours} allows pinpointing the decisive factors sufficient for and leading to the correct answer with optimal reasoning during the multi-hop search.

FLARE: 忠実な論理支援推論と探索

FLARE: Faithful Logic-Aided Reasoning and Exploration

要旨

Support