自检:使用LLMs进行零-shot 检查其自身的逐步推理
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
August 1, 2023
作者: Ning Miao, Yee Whye Teh, Tom Rainforth
cs.AI
摘要
最近在大型语言模型(LLMs)方面取得的进展,尤其是链式思维(CoT)提示的发明,使得解决推理问题成为可能。然而,即使是最强大的LLMs仍然在需要非线性思维和多步推理的更复杂问题上挣扎。在这项工作中,我们探讨了LLMs是否具有识别自身错误的能力,而无需借助外部资源。具体来说,我们调查它们是否能够用于识别逐步推理中的个别错误。为此,我们提出了一种零-shot验证方案来识别这类错误。然后,我们利用这个验证方案来提高问答性能,通过在不同生成的答案上进行加权投票来实现。我们在三个数学数据集-GSM8K、MathQA和MATH上测试了这种方法,并发现它成功识别错误,并进而提高了最终的预测性能。
English
The recent progress in large language models (LLMs), especially the invention
of chain-of-thoughts (CoT) prompting, makes it possible to solve reasoning
problems. However, even the strongest LLMs are still struggling with more
complicated problems that require non-linear thinking and multi-step reasoning.
In this work, we explore whether LLMs have the ability to recognize their own
errors, without resorting to external resources. In particular, we investigate
whether they can be used to identify individual errors within a step-by-step
reasoning. To this end, we propose a zero-shot verification scheme to recognize
such errors. We then use this verification scheme to improve question-answering
performance, by using it to perform weighted voting on different generated
answers. We test the method on three math datasets-GSM8K, MathQA, and MATH-and
find that it successfully recognizes errors and, in turn, increases final
predictive performance.