大语言模型也会“脑退化”!
LLMs Can Get "Brain Rot"!
October 15, 2025
作者: Shuo Xing, Junyuan Hong, Yifan Wang, Runjin Chen, Zhenyu Zhang, Ananth Grama, Zhengzhong Tu, Zhangyang Wang
cs.AI
摘要
我们提出并验证了“大语言模型脑退化假说”:持续接触低质量网络文本会导致大语言模型(LLMs)出现持久的认知能力下降。为了因果性地隔离数据质量的影响,我们在真实的Twitter/X语料库上进行了对照实验,通过两种正交的操作化方法——M1(参与度)和M2(语义质量)——构建了低质量数据集和反向控制数据集,确保各条件下token规模与训练操作相匹配。与对照组相比,四个LLM在低质量数据集上的持续预训练引发了推理、长上下文理解、安全性方面的显著下降(Hedges' g>0.3),并加剧了“黑暗特质”(如心理变态、自恋)的表现。低质量与控制数据集的逐步混合也呈现出剂量-反应式的认知衰退:例如,在M1条件下,随着低质量比例从0%升至100%,ARC-Challenge结合思维链的得分从74.9降至57.2,RULER-CWE从84.4降至52.3。
错误分析揭示了几个关键发现。首先,我们识别出思维跳跃为主要损伤点:模型越来越多地截断或跳过推理链,这解释了大部分错误增长。其次,观察到部分但不完全的恢复:扩大指令微调和干净数据预训练虽能改善下降的认知能力,却无法恢复至基线水平,表明存在持续的表示漂移而非格式不匹配。最后,我们发现,在M1中,推文的流行度这一非语义指标比推文长度更能预示脑退化效应。综合来看,这些结果为数据质量是LLM能力衰退的因果驱动因素提供了多视角的强有力证据,将持续预训练中的数据筛选重新定位为训练阶段的安全问题,并激励对已部署LLM进行常规的“认知健康检查”。
English
We propose and test the LLM Brain Rot Hypothesis: continual exposure to junk
web text induces lasting cognitive decline in large language models (LLMs). To
causally isolate data quality, we run controlled experiments on real Twitter/X
corpora, constructing junk and reversely controlled datasets via two orthogonal
operationalizations: M1 (engagement degree) and M2 (semantic quality), with
matched token scale and training operations across conditions. Contrary to the
control group, continual pre-training of 4 LLMs on the junk dataset causes
non-trivial declines (Hedges' g>0.3) on reasoning, long-context
understanding, safety, and inflating "dark traits" (e.g., psychopathy,
narcissism). The gradual mixtures of junk and control datasets also yield
dose-response cognition decay: for example, under M1, ARC-Challenge with Chain
Of Thoughts drops 74.9 rightarrow 57.2 and RULER-CWE 84.4 rightarrow 52.3
as junk ratio rises from 0% to 100%.
Error forensics reveal several key insights. First, we identify
thought-skipping as the primary lesion: models increasingly truncate or skip
reasoning chains, explaining most of the error growth. Second, partial but
incomplete healing is observed: scaling instruction tuning and clean data
pre-training improve the declined cognition yet cannot restore baseline
capability, suggesting persistent representational drift rather than format
mismatch. Finally, we discover that the popularity, a non-semantic metric, of a
tweet is a better indicator of the Brain Rot effect than the length in M1.
Together, the results provide significant, multi-perspective evidence that data
quality is a causal driver of LLM capability decay, reframing curation for
continual pretraining as a training-time safety problem and motivating
routine "cognitive health checks" for deployed LLMs.