언어 모델의 신뢰도 조절 뉴런

초록

널리 사용되고 있음에도 불구하고, 대형 언어 모델(LLM)이 다음 토큰 예측에서 불확실성을 표현하고 조절하는 메커니즘은 여전히 크게 탐구되지 않고 있습니다. 본 연구는 이러한 불확실성에 영향을 미치는 것으로 여겨지는 두 가지 핵심 요소를 조사합니다: 최근 발견된 엔트로피 뉴런과 우리가 토큰 빈도 뉴런이라고 명명한 새로운 요소 집합입니다. 엔트로피 뉴런은 비정상적으로 높은 가중치 노름을 특징으로 하며, 최종 레이어 정규화(LayerNorm) 스케일에 영향을 미쳐 로짓을 효과적으로 축소합니다. 우리의 연구는 엔트로피 뉴런이 언임베딩 널 공간에 기록함으로써 작동하며, 이는 로짓 자체에 최소한의 직접적인 영향을 미치면서 잔차 스트림 노름에 영향을 줄 수 있게 한다는 것을 보여줍니다. 우리는 최대 70억 개의 파라미터를 가진 다양한 모델에서 엔트로피 뉴런의 존재를 관찰했습니다. 반면, 본 연구에서 처음으로 발견하고 기술한 토큰 빈도 뉴런은 각 토큰의 로짓을 로그 빈도에 비례하여 증가시키거나 억제함으로써 출력 분포를 유니그램 분포 쪽으로 또는 반대 방향으로 이동시킵니다. 마지막으로, 엔트로피 뉴런이 유도 설정(즉, 반복되는 부분 시퀀스를 감지하고 계속하는 상황)에서 신뢰도를 능동적으로 관리하는 상세한 사례 연구를 제시합니다.

English

Despite their widespread use, the mechanisms by which large language models (LLMs) represent and regulate uncertainty in next-token predictions remain largely unexplored. This study investigates two critical components believed to influence this uncertainty: the recently discovered entropy neurons and a new set of components that we term token frequency neurons. Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits. Our work shows that entropy neurons operate by writing onto an unembedding null space, allowing them to impact the residual stream norm with minimal direct effect on the logits themselves. We observe the presence of entropy neurons across a range of models, up to 7 billion parameters. On the other hand, token frequency neurons, which we discover and describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution. Finally, we present a detailed case study where entropy neurons actively manage confidence in the setting of induction, i.e. detecting and continuing repeated subsequences.

언어 모델의 신뢰도 조절 뉴런

Confidence Regulation Neurons in Language Models

초록

Support