從錯誤中學習的情境原則

摘要

在上下文學習（ICL，又稱為少量提示）已成為調整語言模型（LLMs）以適應下游任務的標準方法，通過從少量輸入-輸出示例中學習。然而，所有基於ICL的方法僅從正確的輸入-輸出對中學習。在本文中，我們重新審視這一範式，從這些少量給定的輸入-輸出示例中學到更多。我們引入學習原則（LEAP）：首先，我們故意讓模型在這些少量示例上犯錯；然後我們反思這些錯誤，並從中學習明確的任務特定“原則”，這些原則有助於解決類似問題並避免常見錯誤；最後，我們提示模型使用原始的少量示例和這些學習到的通用原則來回答看不見的測試問題。我們在廣泛的基準測試中評估LEAP，包括多跳問答（Hotpot QA）、文本問答（DROP）、Big-Bench Hard 推理和數學問題（GSM8K和MATH）；在所有這些基準測試中，LEAP都改進了最強大的可用LLMs，如GPT-3.5-turbo、GPT-4、GPT-4 turbo和Claude-2.1。例如，LEAP在DROP中比使用GPT-4的標準少量提示提高了7.5％，在HotpotQA中提高了3.3％。重要的是，LEAP不需要比標準少量提示設置更多的輸入或示例。

English

In-context learning (ICL, also known as few-shot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based approaches only learn from correct input-output pairs. In this paper, we revisit this paradigm, by learning more from the few given input-output examples. We introduce Learning Principles (LEAP): First, we intentionally induce the model to make mistakes on these few examples; then we reflect on these mistakes, and learn explicit task-specific "principles" from them, which help solve similar problems and avoid common mistakes; finally, we prompt the model to answer unseen test questions using the original few-shot examples and these learned general principles. We evaluate LEAP on a wide range of benchmarks, including multi-hop question answering (Hotpot QA), textual QA (DROP), Big-Bench Hard reasoning, and math problems (GSM8K and MATH); in all these benchmarks, LEAP improves the strongest available LLMs such as GPT-3.5-turbo, GPT-4, GPT-4 turbo and Claude-2.1. For example, LEAP improves over the standard few-shot prompting using GPT-4 by 7.5% in DROP, and by 3.3% in HotpotQA. Importantly, LEAP does not require any more input or examples than the standard few-shot prompting settings.

從錯誤中學習的情境原則

In-Context Principle Learning from Mistakes

摘要

Support