从错误中学习的上下文原则

摘要

在上下文学习（ICL，也称为少样本提示）已成为调整大型语言模型适应下游任务的标准方法，通过从少量输入-输出示例中学习。然而，所有基于ICL的方法只从正确的输入-输出对中学习。在本文中，我们重新审视这一范式，通过从少量给定的输入-输出示例中学到更多。我们引入了学习原则（LEAP）：首先，我们有意让模型在这些少量示例上犯错误；然后我们反思这些错误，并从中学习出明确的任务特定“原则”，这些原则有助于解决类似问题并避免常见错误；最后，我们提示模型使用原始的少样本示例和这些学到的通用原则来回答未见过的测试问题。我们在广泛的基准测试中评估了LEAP，包括多跳问题回答（Hotpot QA）、文本问答（DROP）、Big-Bench Hard 推理以及数学问题（GSM8K和MATH）；在所有这些基准测试中，LEAP都提升了最强大的可用大型语言模型，如GPT-3.5-turbo、GPT-4、GPT-4 turbo和Claude-2.1。例如，LEAP在DROP中比使用GPT-4的标准少样本提示提高了7.5％，在HotpotQA中提高了3.3％。重要的是，LEAP不需要比标准少样本提示设置更多的输入或示例。

English

In-context learning (ICL, also known as few-shot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based approaches only learn from correct input-output pairs. In this paper, we revisit this paradigm, by learning more from the few given input-output examples. We introduce Learning Principles (LEAP): First, we intentionally induce the model to make mistakes on these few examples; then we reflect on these mistakes, and learn explicit task-specific "principles" from them, which help solve similar problems and avoid common mistakes; finally, we prompt the model to answer unseen test questions using the original few-shot examples and these learned general principles. We evaluate LEAP on a wide range of benchmarks, including multi-hop question answering (Hotpot QA), textual QA (DROP), Big-Bench Hard reasoning, and math problems (GSM8K and MATH); in all these benchmarks, LEAP improves the strongest available LLMs such as GPT-3.5-turbo, GPT-4, GPT-4 turbo and Claude-2.1. For example, LEAP improves over the standard few-shot prompting using GPT-4 by 7.5% in DROP, and by 3.3% in HotpotQA. Importantly, LEAP does not require any more input or examples than the standard few-shot prompting settings.

从错误中学习的上下文原则

In-Context Principle Learning from Mistakes

摘要

Support