背包語言模型

摘要

我們提出了Backpacks：一種新的神經架構，將強大的建模性能與可解釋性和控制接口結合在一起。Backpacks學習詞彙中每個單詞的多個非上下文感知向量，並將序列中的單詞表示為該序列中感知向量的上下文相關、非負線性組合。我們發現，在訓練後，感知向量會專門化，每個都編碼單詞的不同方面。我們可以通過檢查其（非上下文、線性）投影到輸出空間上來解釋感知向量，並且可以干預這些可解釋的鉤子以可預測地改變模型的行為。我們在OpenWebText上訓練了一個擁有170M參數的Backpack語言模型，與一個GPT-2 small（124M參數）Transformer的損失相匹配。在詞彙相似性評估中，我們發現Backpack感知向量甚至優於一個擁有6B參數的Transformer LM的詞嵌入。最後，我們提出了簡單的算法，通過干預感知向量來執行可控文本生成和去偏見。例如，我們可以編輯感知詞彙以更傾向某個主題，或將性別偏見源頭定位到一個感知向量並全局抑制該感知。

English

We present Backpacks: a new neural architecture that marries strong modeling performance with an interface for interpretability and control. Backpacks learn multiple non-contextual sense vectors for each word in a vocabulary, and represent a word in a sequence as a context-dependent, non-negative linear combination of sense vectors in this sequence. We find that, after training, sense vectors specialize, each encoding a different aspect of a word. We can interpret a sense vector by inspecting its (non-contextual, linear) projection onto the output space, and intervene on these interpretable hooks to change the model's behavior in predictable ways. We train a 170M-parameter Backpack language model on OpenWebText, matching the loss of a GPT-2 small (124Mparameter) Transformer. On lexical similarity evaluations, we find that Backpack sense vectors outperform even a 6B-parameter Transformer LM's word embeddings. Finally, we present simple algorithms that intervene on sense vectors to perform controllable text generation and debiasing. For example, we can edit the sense vocabulary to tend more towards a topic, or localize a source of gender bias to a sense vector and globally suppress that sense.

背包語言模型

Backpack Language Models

摘要

Support