语言模型中的置信度调节神经元
Confidence Regulation Neurons in Language Models
June 24, 2024
作者: Alessandro Stolfo, Ben Wu, Wes Gurnee, Yonatan Belinkov, Xingyi Song, Mrinmaya Sachan, Neel Nanda
cs.AI
摘要
尽管大型语言模型(LLMs)被广泛使用,但其表示和调节下一个标记预测中的不确定性的机制仍然鲜为人知。本研究调查了两个被认为影响这种不确定性的关键组件:最近发现的熵神经元和我们称之为标记频率神经元的一组新组件。熵神经元的特点是具有异常高的权重范数,并影响最终层归一化(LayerNorm)尺度,以有效地降低对数。我们的工作表明,熵神经元通过写入未嵌入的空间来运作,使它们能够对残差流范数产生最小的直接影响,而对对数本身的影响很小。我们观察到熵神经元存在于一系列模型中,包括高达70亿参数的模型。另一方面,我们首次发现并描述的标记频率神经元根据其对数频率成比例地增强或抑制每个标记的对数,从而将输出分布朝向或远离单字分布。最后,我们提供了一个详细的案例研究,其中熵神经元在感应设置中积极管理信心,即检测和继续重复的子序列。
English
Despite their widespread use, the mechanisms by which large language models
(LLMs) represent and regulate uncertainty in next-token predictions remain
largely unexplored. This study investigates two critical components believed to
influence this uncertainty: the recently discovered entropy neurons and a new
set of components that we term token frequency neurons. Entropy neurons are
characterized by an unusually high weight norm and influence the final layer
normalization (LayerNorm) scale to effectively scale down the logits. Our work
shows that entropy neurons operate by writing onto an unembedding null space,
allowing them to impact the residual stream norm with minimal direct effect on
the logits themselves. We observe the presence of entropy neurons across a
range of models, up to 7 billion parameters. On the other hand, token frequency
neurons, which we discover and describe here for the first time, boost or
suppress each token's logit proportionally to its log frequency, thereby
shifting the output distribution towards or away from the unigram distribution.
Finally, we present a detailed case study where entropy neurons actively manage
confidence in the setting of induction, i.e. detecting and continuing repeated
subsequences.Summary
AI-Generated Summary