SelfCheck: 段階的推論をゼロショットで自己チェックするためのLLMの活用

要旨

大規模言語モデル（LLM）の最近の進展、特に連鎖思考（CoT）プロンプティングの登場により、推論問題を解決することが可能になりました。しかし、最も強力なLLMでさえ、非線形思考や多段階推論を必要とするより複雑な問題にはまだ苦戦しています。本研究では、LLMが外部リソースに頼らずに自身のエラーを認識する能力を持っているかどうかを探ります。特に、段階的な推論の中で個々のエラーを特定できるかどうかを調査します。この目的のために、ゼロショット検証スキームを提案し、そのようなエラーを認識します。次に、この検証スキームを用いて、生成された異なる回答に対して重み付き投票を行うことで、質問応答の性能を向上させます。この手法を3つの数学データセット（GSM8K、MathQA、MATH）でテストし、エラーをうまく認識し、最終的な予測性能を向上させることに成功しました。

English

The recent progress in large language models (LLMs), especially the invention of chain-of-thoughts (CoT) prompting, makes it possible to solve reasoning problems. However, even the strongest LLMs are still struggling with more complicated problems that require non-linear thinking and multi-step reasoning. In this work, we explore whether LLMs have the ability to recognize their own errors, without resorting to external resources. In particular, we investigate whether they can be used to identify individual errors within a step-by-step reasoning. To this end, we propose a zero-shot verification scheme to recognize such errors. We then use this verification scheme to improve question-answering performance, by using it to perform weighted voting on different generated answers. We test the method on three math datasets-GSM8K, MathQA, and MATH-and find that it successfully recognizes errors and, in turn, increases final predictive performance.

SelfCheck: 段階的推論をゼロショットで自己チェックするためのLLMの活用

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

要旨

Support