ChatPaper.aiChatPaper

計畫與解決提示:提升大型語言模型的零-shot 連貫思維推理

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

May 6, 2023
作者: Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, Ee-Peng Lim
cs.AI

摘要

最近已證明大型語言模型(LLMs)在各種自然語言處理任務中表現出色。為應對多步驟推理任務,少樣本思維鏈(CoT)提示包括少量手工製作的逐步推理演示,使LLMs能夠明確生成推理步驟並提高其推理任務準確性。為了消除手動工作,零樣本-CoT將目標問題陳述與“讓我們逐步思考”連接為LLMs的輸入提示。儘管零樣本-CoT取得成功,但仍存在三個缺陷:計算錯誤、遺漏步驟錯誤和語義誤解錯誤。為解決遺漏步驟錯誤,我們提出計劃與解決(PS)提示。它由兩個組件組成:首先,制定計劃將整個任務劃分為較小的子任務,然後根據計劃執行子任務。為解決計算錯誤並提高生成推理步驟的質量,我們通過更詳細的指導擴展PS提示並推導PS+提示。我們在三個推理問題跨十個數據集上評估了我們提出的提示策略。在GPT-3上的實驗結果顯示,我們提出的零樣本提示始終在所有數據集上大幅優於零樣本-CoT,與零樣本思維程序提示相當或超越,並在數學推理問題上與8樣本CoT提示性能相當。代碼可在https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting找到。
English
Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zero-shot-CoT concatenates the target problem statement with "Let's think step by step" as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.
PDF31December 15, 2024