백팩 언어 모델

초록

우리는 강력한 모델링 성능과 해석 가능성 및 제어를 위한 인터페이스를 결합한 새로운 신경망 아키텍처인 Backpacks를 소개한다. Backpacks는 어휘 사전 내 각 단어에 대해 여러 개의 비문맥적 의미 벡터를 학습하며, 시퀀스 내 단어를 해당 시퀀스의 의미 벡터들의 문맥 의존적이고 비음수인 선형 결합으로 표현한다. 학습 후, 의미 벡터들은 각각 단어의 다른 측면을 인코딩하도록 특화된다. 우리는 의미 벡터를 출력 공간에 대한 (비문맥적, 선형) 투영을 검토함으로써 해석할 수 있으며, 이러한 해석 가능한 후크에 개입하여 모델의 행동을 예측 가능한 방식으로 변경할 수 있다. 우리는 OpenWebText 데이터셋에서 1억 7천만 개의 파라미터를 가진 Backpack 언어 모델을 학습시켜, GPT-2 small(1억 2천4백만 파라미터) Transformer의 손실과 동등한 성능을 달성했다. 어휘 유사성 평가에서 Backpack 의미 벡터는 60억 파라미터 Transformer 언어 모델의 단어 임베딩보다도 우수한 성능을 보였다. 마지막으로, 의미 벡터에 개입하여 제어 가능한 텍스트 생성과 편향 제거를 수행하는 간단한 알고리즘을 제시한다. 예를 들어, 특정 주제로 더욱 기울어지도록 의미 사전을 편집하거나, 성별 편향의 원인을 특정 의미 벡터로 국한시켜 해당 의미를 전역적으로 억제할 수 있다.

English

We present Backpacks: a new neural architecture that marries strong modeling performance with an interface for interpretability and control. Backpacks learn multiple non-contextual sense vectors for each word in a vocabulary, and represent a word in a sequence as a context-dependent, non-negative linear combination of sense vectors in this sequence. We find that, after training, sense vectors specialize, each encoding a different aspect of a word. We can interpret a sense vector by inspecting its (non-contextual, linear) projection onto the output space, and intervene on these interpretable hooks to change the model's behavior in predictable ways. We train a 170M-parameter Backpack language model on OpenWebText, matching the loss of a GPT-2 small (124Mparameter) Transformer. On lexical similarity evaluations, we find that Backpack sense vectors outperform even a 6B-parameter Transformer LM's word embeddings. Finally, we present simple algorithms that intervene on sense vectors to perform controllable text generation and debiasing. For example, we can edit the sense vocabulary to tend more towards a topic, or localize a source of gender bias to a sense vector and globally suppress that sense.

백팩 언어 모델

Backpack Language Models

초록

Support