背包语言模型

摘要

我们提出了一种新的神经架构——背包（Backpacks），它将强大的建模性能与可解释性和控制接口结合在一起。背包学习每个词汇中的多个非上下文感知向量，并将序列中的单词表示为该序列中感知向量的上下文相关、非负线性组合。我们发现，在训练后，感知向量会专门化，每个向量编码单词的不同方面。我们可以通过检查感知向量在输出空间上的（非上下文、线性）投影来解释一个感知向量，并通过干预这些可解释的钩子以可预测的方式改变模型的行为。我们在OpenWebText上训练了一个拥有170M参数的背包语言模型，与一个GPT-2 small（124M参数）Transformer的损失相匹配。在词汇相似性评估中，我们发现背包感知向量甚至胜过了一个拥有6B参数的Transformer LM的词嵌入。最后，我们提出了简单的算法，通过干预感知向量来执行可控文本生成和去偏见化。例如，我们可以编辑感知词汇以更倾向于某个主题，或将性别偏见源定位到一个感知向量并全局抑制该感知。

English

We present Backpacks: a new neural architecture that marries strong modeling performance with an interface for interpretability and control. Backpacks learn multiple non-contextual sense vectors for each word in a vocabulary, and represent a word in a sequence as a context-dependent, non-negative linear combination of sense vectors in this sequence. We find that, after training, sense vectors specialize, each encoding a different aspect of a word. We can interpret a sense vector by inspecting its (non-contextual, linear) projection onto the output space, and intervene on these interpretable hooks to change the model's behavior in predictable ways. We train a 170M-parameter Backpack language model on OpenWebText, matching the loss of a GPT-2 small (124Mparameter) Transformer. On lexical similarity evaluations, we find that Backpack sense vectors outperform even a 6B-parameter Transformer LM's word embeddings. Finally, we present simple algorithms that intervene on sense vectors to perform controllable text generation and debiasing. For example, we can edit the sense vocabulary to tend more towards a topic, or localize a source of gender bias to a sense vector and globally suppress that sense.

背包语言模型

Backpack Language Models

摘要

Support