自我檢查：使用LLMs來零-shot檢查其自身的逐步推理

摘要

最近在大型語言模型（LLMs）方面取得的進展，尤其是鏈式思維（CoT）提示的發明，使得解決推理問題成為可能。然而，即使是最強大的LLMs仍在努力應對需要非線性思維和多步推理的更複雜問題。在這項工作中，我們探索LLMs是否具有識別自身錯誤的能力，而無需倚賴外部資源。特別是，我們調查它們是否能夠用於識別逐步推理中的個別錯誤。為此，我們提出了一種零-shot驗證方案來識別此類錯誤。然後，我們使用這個驗證方案來改善問答表現，通過在不同生成答案上進行加權投票。我們在三個數學數據集-GSM8K、MathQA和MATH上測試該方法，發現它成功識別錯誤，進而提高最終預測性能。

English

The recent progress in large language models (LLMs), especially the invention of chain-of-thoughts (CoT) prompting, makes it possible to solve reasoning problems. However, even the strongest LLMs are still struggling with more complicated problems that require non-linear thinking and multi-step reasoning. In this work, we explore whether LLMs have the ability to recognize their own errors, without resorting to external resources. In particular, we investigate whether they can be used to identify individual errors within a step-by-step reasoning. To this end, we propose a zero-shot verification scheme to recognize such errors. We then use this verification scheme to improve question-answering performance, by using it to perform weighted voting on different generated answers. We test the method on three math datasets-GSM8K, MathQA, and MATH-and find that it successfully recognizes errors and, in turn, increases final predictive performance.

自我檢查：使用LLMs來零-shot檢查其自身的逐步推理

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

摘要

Support