Apprendimento In-Contesto dei Principi dagli Errori

Abstract

L'apprendimento in contesto (ICL, noto anche come prompting few-shot) è stato il metodo standard per adattare i modelli linguistici di grandi dimensioni (LLM) a compiti specifici, apprendendo da pochi esempi di input-output. Tuttavia, tutti gli approcci basati su ICL apprendono solo da coppie corrette di input-output. In questo articolo, riprendiamo questo paradigma, cercando di apprendere di più dai pochi esempi di input-output forniti. Introduciamo i Principi di Apprendimento (LEAP): in primo luogo, induciamo intenzionalmente il modello a commettere errori su questi pochi esempi; poi riflettiamo su questi errori e apprendiamo "principi" espliciti specifici per il compito, che aiutano a risolvere problemi simili e a evitare errori comuni; infine, sollecitiamo il modello a rispondere a domande di test non viste utilizzando i pochi esempi originali e questi principi generali appresi. Valutiamo LEAP su un'ampia gamma di benchmark, tra cui risposte a domande multi-hop (Hotpot QA), QA testuale (DROP), ragionamento Big-Bench Hard e problemi matematici (GSM8K e MATH); in tutti questi benchmark, LEAP migliora i più potenti LLM disponibili come GPT-3.5-turbo, GPT-4, GPT-4 turbo e Claude-2.1. Ad esempio, LEAP migliora rispetto al prompting few-shot standard utilizzando GPT-4 del 7,5% in DROP e del 3,3% in HotpotQA. È importante sottolineare che LEAP non richiede ulteriori input o esempi rispetto alle impostazioni standard del prompting few-shot.

English

In-context learning (ICL, also known as few-shot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based approaches only learn from correct input-output pairs. In this paper, we revisit this paradigm, by learning more from the few given input-output examples. We introduce Learning Principles (LEAP): First, we intentionally induce the model to make mistakes on these few examples; then we reflect on these mistakes, and learn explicit task-specific "principles" from them, which help solve similar problems and avoid common mistakes; finally, we prompt the model to answer unseen test questions using the original few-shot examples and these learned general principles. We evaluate LEAP on a wide range of benchmarks, including multi-hop question answering (Hotpot QA), textual QA (DROP), Big-Bench Hard reasoning, and math problems (GSM8K and MATH); in all these benchmarks, LEAP improves the strongest available LLMs such as GPT-3.5-turbo, GPT-4, GPT-4 turbo and Claude-2.1. For example, LEAP improves over the standard few-shot prompting using GPT-4 by 7.5% in DROP, and by 3.3% in HotpotQA. Importantly, LEAP does not require any more input or examples than the standard few-shot prompting settings.

Apprendimento In-Contesto dei Principi dagli Errori

In-Context Principle Learning from Mistakes

Abstract

Support