ChatPaper.aiChatPaper

计划和解决提示:改进大型语言模型的零样本思维链推理

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

May 6, 2023
作者: Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, Ee-Peng Lim
cs.AI

摘要

最近已经证明大型语言模型(LLMs)在各种自然语言处理任务中表现出色。为了解决多步推理任务,少样本链式思维(CoT)提示包括一些手工制作的逐步推理演示,使LLMs能够明确生成推理步骤并提高其推理任务准确性。为了消除手动工作,零样本-CoT将目标问题陈述与“让我们逐步思考”连接作为LLMs的输入提示。尽管零样本-CoT取得了成功,但仍存在三个缺陷:计算错误、缺失步骤错误和语义误解错误。为了解决缺失步骤错误,我们提出了计划与解决(PS)提示。它包括两个组成部分:首先,制定一个计划将整个任务分解为较小的子任务,然后根据计划执行子任务。为了解决计算错误并提高生成推理步骤的质量,我们通过更详细的说明扩展了PS提示,并得到了PS+提示。我们在三个推理问题的十个数据集上评估了我们提出的提示策略。在GPT-3上的实验结果表明,我们提出的零样本提示始终在所有数据集上大幅优于零样本-CoT,与零样本思维程序提示相当或超过,并在数学推理问题上与8样本CoT提示性能相当。代码可在https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting找到。
English
Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zero-shot-CoT concatenates the target problem statement with "Let's think step by step" as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.
PDF31December 15, 2024