知识的诞生:大语言模型跨时空与尺度的涌现特征
The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models
May 26, 2025
作者: Shashata Sawmya, Micah Adler, Nir Shavit
cs.AI
摘要
本研究探讨了大规模语言模型(LLMs)中可解释类别特征的出现规律,分析了这些特征在训练检查点(时间维度)、Transformer层(空间维度)以及不同模型规模(尺度维度)上的表现。通过使用稀疏自编码器进行机制解释性分析,我们识别了特定语义概念在神经激活中的出现时机与位置。研究结果表明,在多个领域中,特征的出现存在明确的时间与规模阈值。特别值得注意的是,空间分析揭示了意外的语义再激活现象,即早期层的特征在后续层中重新出现,这一发现对Transformer模型中表征动态的标准假设提出了挑战。
English
This paper studies the emergence of interpretable categorical features within
large language models (LLMs), analyzing their behavior across training
checkpoints (time), transformer layers (space), and varying model sizes
(scale). Using sparse autoencoders for mechanistic interpretability, we
identify when and where specific semantic concepts emerge within neural
activations. Results indicate clear temporal and scale-specific thresholds for
feature emergence across multiple domains. Notably, spatial analysis reveals
unexpected semantic reactivation, with early-layer features re-emerging at
later layers, challenging standard assumptions about representational dynamics
in transformer models.