계획-해결 프롬프팅: 대규모 언어 모델의 제로샷 사고 연쇄 추론 능력 향상

초록

대규모 언어 모델(LLMs)은 최근 다양한 자연어 처리(NLP) 과제에서 인상적인 성능을 보여주고 있다. 다단계 추론 과제를 해결하기 위해, 소수 샷 사고 사슬(CoT) 프롬프팅은 몇 가지 수작업으로 제작된 단계별 추론 데모를 포함하여 LLMs가 명시적으로 추론 단계를 생성하고 추론 과제의 정확도를 향상시킬 수 있도록 한다. 이러한 수작업을 없애기 위해, 제로샷 CoT는 대상 문제 설명에 "단계별로 생각해 봅시다"라는 문구를 입력 프롬프트로 연결하여 LLMs에 제공한다. 제로샷 CoT의 성공에도 불구하고, 여전히 세 가지 문제점이 존재한다: 계산 오류, 단계 누락 오류, 그리고 의미적 오해 오류이다. 단계 누락 오류를 해결하기 위해, 우리는 계획 및 해결(Plan-and-Solve, PS) 프롬프팅을 제안한다. 이는 두 가지 구성 요소로 이루어져 있다: 첫째, 전체 과제를 더 작은 하위 과제로 나누는 계획을 수립하고, 둘째, 계획에 따라 하위 과제를 수행하는 것이다. 계산 오류를 해결하고 생성된 추론 단계의 품질을 향상시키기 위해, 우리는 PS 프롬프팅을 더 상세한 지침으로 확장하여 PS+ 프롬프팅을 도출한다. 우리는 제안된 프롬프팅 전략을 세 가지 추론 문제에 걸친 열 개의 데이터셋에서 평가한다. GPT-3에 대한 실험 결과는 우리가 제안한 제로샷 프롬프팅이 모든 데이터셋에서 제로샷 CoT를 큰 차이로 능가하며, 제로샷 프로그램 사고(Zero-shot-Program-of-Thought) 프롬프팅과 비슷하거나 이를 초과하는 성능을 보이고, 수학 추론 문제에서 8샷 CoT 프롬프팅과 비슷한 성능을 보임을 나타낸다. 코드는 https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting에서 확인할 수 있다.

English

Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zero-shot-CoT concatenates the target problem statement with "Let's think step by step" as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.

계획-해결 프롬프팅: 대규모 언어 모델의 제로샷 사고 연쇄 추론 능력 향상

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

초록

Support