知識的誕生:大型語言模型中跨時間、空間與尺度的湧現特徵
The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models
May 26, 2025
作者: Shashata Sawmya, Micah Adler, Nir Shavit
cs.AI
摘要
本研究探討大型語言模型(LLMs)中可解釋類別特徵的湧現現象,分析這些特徵在訓練檢查點(時間)、Transformer層次(空間)以及不同模型規模(尺度)下的行為。透過使用稀疏自編碼器進行機制性解釋,我們識別出特定語義概念在神經激活中何時何地出現。結果顯示,在多個領域中,特徵的湧現存在明確的時間與規模特定閾值。值得注意的是,空間分析揭示了意外的語義再激活現象,早期層次的特徵在後期層次中重新出現,這對Transformer模型中表徵動態的標準假設提出了挑戰。
English
This paper studies the emergence of interpretable categorical features within
large language models (LLMs), analyzing their behavior across training
checkpoints (time), transformer layers (space), and varying model sizes
(scale). Using sparse autoencoders for mechanistic interpretability, we
identify when and where specific semantic concepts emerge within neural
activations. Results indicate clear temporal and scale-specific thresholds for
feature emergence across multiple domains. Notably, spatial analysis reveals
unexpected semantic reactivation, with early-layer features re-emerging at
later layers, challenging standard assumptions about representational dynamics
in transformer models.Summary
AI-Generated Summary