In-Kontext-Prinziplernen aus Fehlern

papers.abstract

In-Context-Learning (ICL, auch bekannt als Few-Shot-Prompting) war die Standardmethode, um große Sprachmodelle (LLMs) für nachgelagerte Aufgaben anzupassen, indem sie aus wenigen Eingabe-Ausgabe-Beispielen lernen. Dennoch lernen alle ICL-basierten Ansätze ausschließlich aus korrekten Eingabe-Ausgabe-Paaren. In diesem Papier überdenken wir dieses Paradigma, indem wir mehr aus den wenigen gegebenen Eingabe-Ausgabe-Beispielen lernen. Wir führen Learning Principles (LEAP) ein: Zuerst veranlassen wir das Modell absichtlich, Fehler bei diesen wenigen Beispielen zu machen; dann reflektieren wir über diese Fehler und lernen explizite, aufgabenbezogene „Prinzipien“ daraus, die helfen, ähnliche Probleme zu lösen und häufige Fehler zu vermeiden; schließlich fordern wir das Modell auf, unbeantwortete Testfragen mithilfe der ursprünglichen Few-Shot-Beispiele und dieser gelernten allgemeinen Prinzipien zu beantworten. Wir evaluieren LEAP auf einer breiten Palette von Benchmarks, darunter Multi-Hop-Fragebeantwortung (Hotpot QA), textbasierte Fragebeantwortung (DROP), Big-Bench Hard Reasoning und mathematische Probleme (GSM8K und MATH); in all diesen Benchmarks verbessert LEAP die leistungsstärksten verfügbaren LLMs wie GPT-3.5-turbo, GPT-4, GPT-4 Turbo und Claude-2.1. Beispielsweise verbessert LEAP das standardmäßige Few-Shot-Prompting mit GPT-4 um 7,5 % in DROP und um 3,3 % in HotpotQA. Wichtig ist, dass LEAP keine zusätzlichen Eingaben oder Beispiele benötigt als die Standard-Few-Shot-Prompting-Einstellungen.

English

In-context learning (ICL, also known as few-shot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based approaches only learn from correct input-output pairs. In this paper, we revisit this paradigm, by learning more from the few given input-output examples. We introduce Learning Principles (LEAP): First, we intentionally induce the model to make mistakes on these few examples; then we reflect on these mistakes, and learn explicit task-specific "principles" from them, which help solve similar problems and avoid common mistakes; finally, we prompt the model to answer unseen test questions using the original few-shot examples and these learned general principles. We evaluate LEAP on a wide range of benchmarks, including multi-hop question answering (Hotpot QA), textual QA (DROP), Big-Bench Hard reasoning, and math problems (GSM8K and MATH); in all these benchmarks, LEAP improves the strongest available LLMs such as GPT-3.5-turbo, GPT-4, GPT-4 turbo and Claude-2.1. For example, LEAP improves over the standard few-shot prompting using GPT-4 by 7.5% in DROP, and by 3.3% in HotpotQA. Importantly, LEAP does not require any more input or examples than the standard few-shot prompting settings.

In-Kontext-Prinziplernen aus Fehlern

In-Context Principle Learning from Mistakes

papers.abstract

Support