プラン・アンド・ソルブ・プロンプティング：大規模言語モデルによるゼロショット連鎖思考推論の改善

要旨

大規模言語モデル（LLM）は、最近さまざまなNLPタスクで印象的な性能を発揮することが示されています。多段階の推論タスクに取り組むために、Few-shot Chain-of-Thought（CoT）プロンプティングでは、手動で作成された段階的な推論デモンストレーションをいくつか含めることで、LLMが明示的に推論ステップを生成し、推論タスクの精度を向上させることができます。この手作業をなくすために、Zero-shot-CoTはターゲットの問題文に「段階的に考えましょう」というプロンプトを連結してLLMに入力します。Zero-shot-CoTの成功にもかかわらず、計算エラー、ステップの欠落エラー、意味の誤解エラーという3つの課題が依然として存在します。ステップの欠落エラーに対処するために、我々はPlan-and-Solve（PS）プロンプティングを提案します。これは、まずタスク全体を小さなサブタスクに分割する計画を立て、次にその計画に従ってサブタスクを実行するという2つのコンポーネントで構成されます。計算エラーに対処し、生成される推論ステップの品質を向上させるために、PSプロンプティングをより詳細な指示で拡張し、PS+プロンプティングを導出します。我々は、提案したプロンプティング戦略を3つの推論問題にわたる10のデータセットで評価しました。GPT-3を用いた実験結果は、提案したゼロショットプロンプティングがすべてのデータセットでZero-shot-CoTを大幅に上回り、Zero-shot-Program-of-Thoughtプロンプティングと同等かそれ以上の性能を示し、数学的推論問題では8-shot CoTプロンプティングと同等の性能を持つことを示しています。コードはhttps://github.com/AGI-Edgerunners/Plan-and-Solve-Promptingで公開されています。

English

Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zero-shot-CoT concatenates the target problem statement with "Let's think step by step" as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.

プラン・アンド・ソルブ・プロンプティング：大規模言語モデルによるゼロショット連鎖思考推論の改善

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

要旨

Support