ChatPaper.aiChatPaper

使用影響函數研究大型語言模型的泛化

Studying Large Language Model Generalization with Influence Functions

August 7, 2023
作者: Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman
cs.AI

摘要

當試圖深入了解機器學習模型以理解並減輕相關風險時,一個潛在有價值的證據來源是:哪些訓練示例對於特定行為有最大貢獻?影響函數旨在回答一個反事實問題:如果將特定序列添加到訓練集中,模型的參數(因此其輸出)將如何變化?儘管影響函數為小型模型提供了洞察,但由於計算逆Hessian向量乘積(IHVP)的困難,對於大型語言模型(LLMs)來說很難擴展。我們使用特徵值校正的 Kronecker-Factored 近似曲率(EK-FAC)來擴展影響函數以適應具有多達 520 億參數的 LLMs。在我們的實驗中,EK-FAC 儘管 IHVP 計算速度快了數個數量級,但實現了與傳統影響函數估算器相似的準確性。我們研究了兩種算法技術來降低計算候選訓練序列梯度的成本:TF-IDF 過濾和查詢批處理。我們使用影響函數來研究 LLMs 的泛化模式,包括影響模式的稀疏性、隨規模增加的抽象性、數學和編程能力、跨語言泛化以及角色扮演行為。儘管存在許多表面上複雜的泛化形式,但我們確定了一個令人驚訝的限制:當關鍵短語的順序翻轉時,影響會衰減至接近零。總的來說,影響函數為我們提供了一個強大的新工具,用於研究 LLMs 的泛化特性。
English
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.
PDF130December 15, 2024