FLARE:忠实逻辑辅助推理与探索
FLARE: Faithful Logic-Aided Reasoning and Exploration
October 14, 2024
作者: Erik Arakelyan, Pasquale Minervini, Pat Verga, Patrick Lewis, Isabelle Augenstein
cs.AI
摘要
基于大型语言模型(LLMs)的现代问答(QA)和推理方法通常使用提示技术,如Chain-of-Thought(CoT),假设生成的结果将更细致地探索问题空间和范围。然而,这种方法在生成符合模型产生的中间推理链的输出时存在困难。在另一端,神经符号方法如Faithful CoT(F-CoT)提出将LLMs与外部符号求解器结合。虽然这些方法具有较高的忠实度,但通常需要经过代码生成训练的模型,并且在处理模糊或难以严格形式化的任务时存在困难。我们引入Faithful Logic-Aided Reasoning and Exploration(\ours),这是一种新颖的可解释方法,用于通过任务分解遍历问题空间。我们使用LLM规划解决方案,通过逻辑编程代码将查询软形式化为事实和谓词,并使用定义空间的详尽多跳搜索模拟该代码执行。我们的方法允许我们计算推理过程相对于生成的代码的忠实度,并分析多跳搜索的步骤,而无需依赖外部求解器。我们的方法在9个不同推理基准中有7个取得了最先进的结果。我们还展示模型的忠实度与整体性能呈正相关,并进一步证明{\ours}能够准确定位决定性因素,足以引导正确答案的最佳推理过程。
English
Modern Question Answering (QA) and Reasoning approaches based on Large
Language Models (LLMs) commonly use prompting techniques, such as
Chain-of-Thought (CoT), assuming the resulting generation will have a more
granular exploration and reasoning over the question space and scope. However,
such methods struggle with generating outputs that are faithful to the
intermediate chain of reasoning produced by the model. On the other end of the
spectrum, neuro-symbolic methods such as Faithful CoT (F-CoT) propose to
combine LLMs with external symbolic solvers. While such approaches boast a high
degree of faithfulness, they usually require a model trained for code
generation and struggle with tasks that are ambiguous or hard to formalise
strictly. We introduce Faithful Logic-Aided
Reasoning and Exploration (\ours), a novel
interpretable approach for traversing the problem space using task
decompositions. We use the LLM to plan a solution, soft-formalise the query
into facts and predicates using a logic programming code and simulate that code
execution using an exhaustive multi-hop search over the defined space. Our
method allows us to compute the faithfulness of the reasoning process w.r.t.
the generated code and analyse the steps of the multi-hop search without
relying on external solvers. Our methods achieve SOTA results on 7
out of 9 diverse reasoning benchmarks. We also show that model
faithfulness positively correlates with overall performance and further
demonstrate that {\ours} allows pinpointing the decisive factors
sufficient for and leading to the correct answer with optimal reasoning during
the multi-hop search.Summary
AI-Generated Summary