バックパック言語モデル

要旨

本論文では、強力なモデリング性能と解釈可能性・制御性を兼ね備えた新しいニューラルアーキテクチャである「Backpacks」を提案する。Backpacksは、語彙中の各単語に対して複数の非文脈依存の意味ベクトルを学習し、シーケンス中の単語を、そのシーケンス内の意味ベクトルの文脈依存かつ非負の線形結合として表現する。学習後、意味ベクトルは特化し、それぞれが単語の異なる側面をエンコードすることがわかった。意味ベクトルは、出力空間への（非文脈依存の線形）射影を検査することで解釈可能であり、これらの解釈可能なフックに介入することで、モデルの挙動を予測可能な方法で変更することができる。我々は、170MパラメータのBackpack言語モデルをOpenWebTextで学習し、GPT-2 small（124Mパラメータ）Transformerの損失に匹敵する性能を達成した。語彙類似性評価において、Backpackの意味ベクトルは、6BパラメータのTransformer LMの単語埋め込みさえも上回る性能を示した。最後に、意味ベクトルに介入して制御可能なテキスト生成やバイアス除去を行うためのシンプルなアルゴリズムを提示する。例えば、意味語彙を編集して特定のトピックに傾かせたり、ジェンダーバイアスの源を意味ベクトルに特定し、その意味をグローバルに抑制したりすることができる。

English

We present Backpacks: a new neural architecture that marries strong modeling performance with an interface for interpretability and control. Backpacks learn multiple non-contextual sense vectors for each word in a vocabulary, and represent a word in a sequence as a context-dependent, non-negative linear combination of sense vectors in this sequence. We find that, after training, sense vectors specialize, each encoding a different aspect of a word. We can interpret a sense vector by inspecting its (non-contextual, linear) projection onto the output space, and intervene on these interpretable hooks to change the model's behavior in predictable ways. We train a 170M-parameter Backpack language model on OpenWebText, matching the loss of a GPT-2 small (124Mparameter) Transformer. On lexical similarity evaluations, we find that Backpack sense vectors outperform even a 6B-parameter Transformer LM's word embeddings. Finally, we present simple algorithms that intervene on sense vectors to perform controllable text generation and debiasing. For example, we can edit the sense vocabulary to tend more towards a topic, or localize a source of gender bias to a sense vector and globally suppress that sense.

バックパック言語モデル

Backpack Language Models

要旨

Support