上下文學習創建任務向量。

摘要

在大型語言模型（LLMs）中，情境學習（ICL）已經成為一種強大的新學習範式。然而，其基本機制仍不太清楚。特別是，將其映射到「標準」機器學習框架是具有挑戰性的，該框架使用訓練集 S 找到某個假設類別中最適合的函數 f(x)。在這裡，我們通過展示情境學習所學習的函數通常具有非常簡單的結構來解決這個問題：它們對應於只有查詢 x 和從訓練集計算出的單一「任務向量」的變壓器LLM。因此，ICL可以被視為將 S 壓縮為單一任務向量 theta(S)，然後使用此任務向量調節變壓器以產生輸出。我們通過一系列模型和任務的全面實驗來支持上述主張。

English

In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set S to find a best-fitting function f(x) in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query x and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing S into a single task vector theta(S) and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

上下文學習創建任務向量。

In-Context Learning Creates Task Vectors

摘要

Support