文脈内での誤りからの原則学習

要旨

インコンテキスト学習（ICL、別名Few-shotプロンプティング）は、少数の入力-出力例から学習することで、大規模言語モデル（LLM）を下流タスクに適応させる標準的な方法となっています。しかし、これまでのICLベースのアプローチはすべて、正しい入力-出力ペアからのみ学習していました。本論文では、このパラダイムを再考し、与えられた少数の入力-出力例からさらに多くを学習する方法を提案します。我々は「学習原則（LEAP）」を導入します。まず、意図的にモデルにこれらの少数の例でミスを起こさせます。次に、これらのミスを振り返り、そこから明示的なタスク固有の「原則」を学習します。これらの原則は、類似の問題を解決し、一般的なミスを回避するのに役立ちます。最後に、モデルに元のFew-shot例とこれらの学習された一般原則を使用して、未見のテスト質問に答えるよう促します。我々はLEAPを、マルチホップ質問応答（Hotpot QA）、テキストQA（DROP）、Big-Bench Hard推論、数学問題（GSM8KおよびMATH）など、幅広いベンチマークで評価しました。これらのすべてのベンチマークにおいて、LEAPはGPT-3.5-turbo、GPT-4、GPT-4 turbo、Claude-2.1などの最強のLLMを改善しました。例えば、LEAPは、標準的なFew-shotプロンプティングを使用したGPT-4と比較して、DROPで7.5%、HotpotQAで3.3%の改善を示しました。重要なことに、LEAPは標準的なFew-shotプロンプティング設定よりも多くの入力や例を必要としません。

English

In-context learning (ICL, also known as few-shot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based approaches only learn from correct input-output pairs. In this paper, we revisit this paradigm, by learning more from the few given input-output examples. We introduce Learning Principles (LEAP): First, we intentionally induce the model to make mistakes on these few examples; then we reflect on these mistakes, and learn explicit task-specific "principles" from them, which help solve similar problems and avoid common mistakes; finally, we prompt the model to answer unseen test questions using the original few-shot examples and these learned general principles. We evaluate LEAP on a wide range of benchmarks, including multi-hop question answering (Hotpot QA), textual QA (DROP), Big-Bench Hard reasoning, and math problems (GSM8K and MATH); in all these benchmarks, LEAP improves the strongest available LLMs such as GPT-3.5-turbo, GPT-4, GPT-4 turbo and Claude-2.1. For example, LEAP improves over the standard few-shot prompting using GPT-4 by 7.5% in DROP, and by 3.3% in HotpotQA. Importantly, LEAP does not require any more input or examples than the standard few-shot prompting settings.

文脈内での誤りからの原則学習

In-Context Principle Learning from Mistakes

要旨

Support