ChatPaper.aiChatPaper

大语言模型也会“脑退化”!

LLMs Can Get "Brain Rot"!

October 15, 2025
作者: Shuo Xing, Junyuan Hong, Yifan Wang, Runjin Chen, Zhenyu Zhang, Ananth Grama, Zhengzhong Tu, Zhangyang Wang
cs.AI

摘要

我们提出并验证了“大语言模型脑退化假说”:持续接触低质量网络文本会导致大语言模型(LLMs)出现持久的认知能力下降。为了因果性地隔离数据质量的影响,我们在真实的Twitter/X语料库上进行了对照实验,通过两种正交的操作化方法——M1(参与度)和M2(语义质量)——构建了低质量数据集和反向控制数据集,确保各条件下token规模与训练操作相匹配。与对照组相比,四个LLM在低质量数据集上的持续预训练引发了推理、长上下文理解、安全性方面的显著下降(Hedges' g>0.3),并加剧了“黑暗特质”(如心理变态、自恋)的表现。低质量与控制数据集的逐步混合也呈现出剂量-反应式的认知衰退:例如,在M1条件下,随着低质量比例从0%升至100%,ARC-Challenge结合思维链的得分从74.9降至57.2,RULER-CWE从84.4降至52.3。 错误分析揭示了几个关键发现。首先,我们识别出思维跳跃为主要损伤点:模型越来越多地截断或跳过推理链,这解释了大部分错误增长。其次,观察到部分但不完全的恢复:扩大指令微调和干净数据预训练虽能改善下降的认知能力,却无法恢复至基线水平,表明存在持续的表示漂移而非格式不匹配。最后,我们发现,在M1中,推文的流行度这一非语义指标比推文长度更能预示脑退化效应。综合来看,这些结果为数据质量是LLM能力衰退的因果驱动因素提供了多视角的强有力证据,将持续预训练中的数据筛选重新定位为训练阶段的安全问题,并激励对已部署LLM进行常规的“认知健康检查”。
English
We propose and test the LLM Brain Rot Hypothesis: continual exposure to junk web text induces lasting cognitive decline in large language models (LLMs). To causally isolate data quality, we run controlled experiments on real Twitter/X corpora, constructing junk and reversely controlled datasets via two orthogonal operationalizations: M1 (engagement degree) and M2 (semantic quality), with matched token scale and training operations across conditions. Contrary to the control group, continual pre-training of 4 LLMs on the junk dataset causes non-trivial declines (Hedges' g>0.3) on reasoning, long-context understanding, safety, and inflating "dark traits" (e.g., psychopathy, narcissism). The gradual mixtures of junk and control datasets also yield dose-response cognition decay: for example, under M1, ARC-Challenge with Chain Of Thoughts drops 74.9 rightarrow 57.2 and RULER-CWE 84.4 rightarrow 52.3 as junk ratio rises from 0% to 100%. Error forensics reveal several key insights. First, we identify thought-skipping as the primary lesion: models increasingly truncate or skip reasoning chains, explaining most of the error growth. Second, partial but incomplete healing is observed: scaling instruction tuning and clean data pre-training improve the declined cognition yet cannot restore baseline capability, suggesting persistent representational drift rather than format mismatch. Finally, we discover that the popularity, a non-semantic metric, of a tweet is a better indicator of the Brain Rot effect than the length in M1. Together, the results provide significant, multi-perspective evidence that data quality is a causal driver of LLM capability decay, reframing curation for continual pretraining as a training-time safety problem and motivating routine "cognitive health checks" for deployed LLMs.
PDF192October 17, 2025