大型語言模型可能患上「腦退化」!
LLMs Can Get "Brain Rot"!
October 15, 2025
作者: Shuo Xing, Junyuan Hong, Yifan Wang, Runjin Chen, Zhenyu Zhang, Ananth Grama, Zhengzhong Tu, Zhangyang Wang
cs.AI
摘要
我們提出並驗證了“大型語言模型腦退化假說”:持續接觸低質量的網絡文本會導致大型語言模型(LLMs)出現持久的認知衰退。為了因果性地隔離數據質量的影響,我們在真實的Twitter/X語料庫上進行了對照實驗,通過兩種正交的操作化方法——M1(參與度)和M2(語義質量)——構建了低質量數據集和反向對照數據集,並在各條件下匹配了詞彙規模和訓練操作。與對照組相比,四種LLMs在低質量數據集上的持續預訓練導致了推理能力、長上下文理解能力、安全性以及“黑暗特質”(如精神病態、自戀)膨脹方面的顯著下降(Hedges' g>0.3)。低質量與對照數據集的逐步混合也呈現出劑量反應的認知衰退:例如,在M1下,隨著低質量比例從0%升至100%,ARC-Challenge的Chain Of Thoughts得分從74.9降至57.2,RULER-CWE從84.4降至52.3。
錯誤分析揭示了幾個關鍵發現。首先,我們識別出思維跳躍為主要損傷:模型越來越多地截斷或跳過推理鏈,這解釋了大部分錯誤的增長。其次,觀察到部分但不完全的恢復:擴大指令微調和清潔數據預訓練改善了衰退的認知能力,但無法恢復至基線水平,這表明存在持久的表徵漂移而非格式不匹配。最後,我們發現,在M1中,推文的流行度(一種非語義指標)比長度更能預測腦退化效應。綜合來看,這些結果提供了多角度的重要證據,表明數據質量是LLM能力衰退的因果驅動因素,將持續預訓練的數據篩選重新定位為訓練時的安全問題,並激勵對已部署LLMs進行常規的“認知健康檢查”。
English
We propose and test the LLM Brain Rot Hypothesis: continual exposure to junk
web text induces lasting cognitive decline in large language models (LLMs). To
causally isolate data quality, we run controlled experiments on real Twitter/X
corpora, constructing junk and reversely controlled datasets via two orthogonal
operationalizations: M1 (engagement degree) and M2 (semantic quality), with
matched token scale and training operations across conditions. Contrary to the
control group, continual pre-training of 4 LLMs on the junk dataset causes
non-trivial declines (Hedges' g>0.3) on reasoning, long-context
understanding, safety, and inflating "dark traits" (e.g., psychopathy,
narcissism). The gradual mixtures of junk and control datasets also yield
dose-response cognition decay: for example, under M1, ARC-Challenge with Chain
Of Thoughts drops 74.9 rightarrow 57.2 and RULER-CWE 84.4 rightarrow 52.3
as junk ratio rises from 0% to 100%.
  Error forensics reveal several key insights. First, we identify
thought-skipping as the primary lesion: models increasingly truncate or skip
reasoning chains, explaining most of the error growth. Second, partial but
incomplete healing is observed: scaling instruction tuning and clean data
pre-training improve the declined cognition yet cannot restore baseline
capability, suggesting persistent representational drift rather than format
mismatch. Finally, we discover that the popularity, a non-semantic metric, of a
tweet is a better indicator of the Brain Rot effect than the length in M1.
Together, the results provide significant, multi-perspective evidence that data
quality is a causal driver of LLM capability decay, reframing curation for
continual pretraining as a training-time safety problem and motivating
routine "cognitive health checks" for deployed LLMs.