ChatPaper.aiChatPaper

利用影响函数研究大型语言模型的泛化能力

Studying Large Language Model Generalization with Influence Functions

August 7, 2023
作者: Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman
cs.AI

摘要

在尝试更好地了解和减轻相关风险的机器学习模型的可见性时,一个潜在有价值的证据来源是:哪些训练示例对于特定行为起到了最大贡献?影响函数旨在回答一个反事实问题:如果将给定序列添加到训练集中,模型的参数(从而是其输出)会如何变化?虽然影响函数为小型模型提供了洞见,但由于计算逆Hessian矢量乘积(IHVP)的困难,很难将其扩展到大型语言模型(LLMs)。我们使用特征校正的Kronecker分解近似曲率(EK-FAC)来将影响函数扩展到具有高达520亿参数的LLMs。在我们的实验中,EK-FAC在计算IHVP时速度快得多,却实现了与传统影响函数估计器类似的准确性。我们研究了两种算法技术来降低计算候选训练序列梯度的成本:TF-IDF过滤和查询批处理。我们使用影响函数来研究LLMs的泛化模式,包括影响模式的稀疏性、随着规模增大的抽象性、数学和编程能力、跨语言泛化以及角色扮演行为。尽管存在许多看似复杂的泛化形式,我们发现一个令人惊讶的限制:当关键短语的顺序颠倒时,影响会衰减至接近零。总的来说,影响函数为我们提供了一个强大的新工具,用于研究LLMs的泛化特性。
English
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.
PDF130December 15, 2024