大语言模型也会“脑退化”！

摘要

我们提出并验证了“大语言模型脑退化假说”：持续接触低质量网络文本会导致大语言模型（LLMs）出现持久的认知能力下降。为了因果性地隔离数据质量的影响，我们在真实的Twitter/X语料库上进行了对照实验，通过两种正交的操作化方法——M1（参与度）和M2（语义质量）——构建了低质量数据集和反向控制数据集，确保各条件下token规模与训练操作相匹配。与对照组相比，四个LLM在低质量数据集上的持续预训练引发了推理、长上下文理解、安全性方面的显著下降（Hedges' g>0.3），并加剧了“黑暗特质”（如心理变态、自恋）的表现。低质量与控制数据集的逐步混合也呈现出剂量-反应式的认知衰退：例如，在M1条件下，随着低质量比例从0%升至100%，ARC-Challenge结合思维链的得分从74.9降至57.2，RULER-CWE从84.4降至52.3。错误分析揭示了几个关键发现。首先，我们识别出思维跳跃为主要损伤点：模型越来越多地截断或跳过推理链，这解释了大部分错误增长。其次，观察到部分但不完全的恢复：扩大指令微调和干净数据预训练虽能改善下降的认知能力，却无法恢复至基线水平，表明存在持续的表示漂移而非格式不匹配。最后，我们发现，在M1中，推文的流行度这一非语义指标比推文长度更能预示脑退化效应。综合来看，这些结果为数据质量是LLM能力衰退的因果驱动因素提供了多视角的强有力证据，将持续预训练中的数据筛选重新定位为训练阶段的安全问题，并激励对已部署LLM进行常规的“认知健康检查”。

English

We propose and test the LLM Brain Rot Hypothesis: continual exposure to junk web text induces lasting cognitive decline in large language models (LLMs). To causally isolate data quality, we run controlled experiments on real Twitter/X corpora, constructing junk and reversely controlled datasets via two orthogonal operationalizations: M1 (engagement degree) and M2 (semantic quality), with matched token scale and training operations across conditions. Contrary to the control group, continual pre-training of 4 LLMs on the junk dataset causes non-trivial declines (Hedges' g>0.3) on reasoning, long-context understanding, safety, and inflating "dark traits" (e.g., psychopathy, narcissism). The gradual mixtures of junk and control datasets also yield dose-response cognition decay: for example, under M1, ARC-Challenge with Chain Of Thoughts drops 74.9 rightarrow 57.2 and RULER-CWE 84.4 rightarrow 52.3 as junk ratio rises from 0% to 100%. Error forensics reveal several key insights. First, we identify thought-skipping as the primary lesion: models increasingly truncate or skip reasoning chains, explaining most of the error growth. Second, partial but incomplete healing is observed: scaling instruction tuning and clean data pre-training improve the declined cognition yet cannot restore baseline capability, suggesting persistent representational drift rather than format mismatch. Finally, we discover that the popularity, a non-semantic metric, of a tweet is a better indicator of the Brain Rot effect than the length in M1. Together, the results provide significant, multi-perspective evidence that data quality is a causal driver of LLM capability decay, reframing curation for continual pretraining as a training-time safety problem and motivating routine "cognitive health checks" for deployed LLMs.

大语言模型也会“脑退化”！

LLMs Can Get "Brain Rot"!

摘要

Support