語言模型中的信心調節神經元

摘要

儘管大型語言模型（LLMs）被廣泛使用，但這些模型在表示和調節下一個標記預測中的不確定性的機制仍然大多未被探索。本研究探討了兩個被認為影響這種不確定性的關鍵組件：最近發現的熵神經元和我們稱之為標記頻率神經元的一組新組件。熵神經元的特徵是具有異常高的權重範數，並影響最終層規範化（LayerNorm）的比例以有效地降低對數。我們的研究表明，熵神經元通過寫入未嵌入的空間來運作，使它們能夠對剩餘流範數產生最小的直接影響，而對對數本身的影響很小。我們觀察到熵神經元存在於各種模型中，甚至達到 70 億個參數。另一方面，我們首次發現並描述的標記頻率神經元，根據其對數頻率，增強或抑制每個標記的對數比例，從而將輸出分佈移向或遠離單詞分佈。最後，我們提出了一個詳細的案例研究，其中熵神經元在歸納設置中積極管理信心，即檢測和繼續重複的子序列。

English

Despite their widespread use, the mechanisms by which large language models (LLMs) represent and regulate uncertainty in next-token predictions remain largely unexplored. This study investigates two critical components believed to influence this uncertainty: the recently discovered entropy neurons and a new set of components that we term token frequency neurons. Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits. Our work shows that entropy neurons operate by writing onto an unembedding null space, allowing them to impact the residual stream norm with minimal direct effect on the logits themselves. We observe the presence of entropy neurons across a range of models, up to 7 billion parameters. On the other hand, token frequency neurons, which we discover and describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution. Finally, we present a detailed case study where entropy neurons actively manage confidence in the setting of induction, i.e. detecting and continuing repeated subsequences.

語言模型中的信心調節神經元

Confidence Regulation Neurons in Language Models

摘要

Support