语言模型学什么及何时学？隐性课程假说探析

摘要

大型语言模型（LLM）能够完成极其复杂的任务，但预训练过程中这些能力如何逐步形成仍缺乏细粒度认知。虽然缩放定律通过验证损失揭示了模型随算力提升的改进程度，却无法说明其技能习得的具体顺序。为弥补这一空白，我们提出"隐性课程假说"：预训练过程在不同模型和数据组合中遵循着可组合且可预测的课程规律。我们通过设计一套涵盖检索、形态转换、指代消解、逻辑推理和数学运算的简单可组合任务来验证该假说，并追踪了四个参数量级（4.1亿至130亿参数）模型族的能力涌现节点。研究发现：模型达到固定准确率阈值的涌现顺序具有高度一致性（45组模型对的斯皮尔曼相关系数ρ=0.81），且复合任务通常在其子任务掌握后出现。进一步发现这种结构被编码于模型表征中——具有相似函数向量表征的任务在训练过程中也呈现相似轨迹。通过利用任务集衍生的表征空间，我们能在未经验证的情况下有效预测预训练过程中简单复合任务的训练轨迹（各模型决定系数R²介于0.68-0.84）。这些结果表明：预训练过程比损失曲线所揭示的更具结构性——技能以跨模型一致的组合顺序涌现，并可通过模型内部表征进行解读。

English

Large language models (LLMs) can perform remarkably complex tasks, yet the fine-grained details of how these capabilities emerge during pretraining remain poorly understood. Scaling laws on validation loss tell us how much a model improves with additional compute, but not what skills it acquires in which order. To remedy this, we propose the Implicit Curriculum Hypothesis: pretraining follows a compositional and predictable curriculum across models and data mixtures. We test this by designing a suite of simple, composable tasks spanning retrieval, morphological transformations, coreference, logical reasoning, and mathematics. Using these tasks, we track emergence points across four model families spanning sizes from 410M-13B parameters. We find that emergence orderings of when models reach fixed accuracy thresholds are strikingly consistent (ρ= .81 across 45 model pairs), and that composite tasks most often emerge after their component tasks. Furthermore, we find that this structure is encoded in model representations: tasks with similar function vector representations also tend to follow similar trajectories in training. By using the space of representations derived from our task set, we can effectively predict the training trajectories of simple held-out compositional tasks throughout the course of pretraining (R^2 = .68-.84 across models) without previously evaluating them. Together, these results suggest that pretraining is more structured than loss curves reveal: skills emerge in a compositional order that is consistent across models and readable from their internals.