原地测试时训练

摘要

传统的“先训练后部署”静态范式从根本上限制了大型语言模型根据现实任务中持续信息流动态调整权重的能力。测试时训练通过在前向推理阶段更新模型部分参数（快速权重）提供了创新解决方案，但该方法在当前LLM生态中的应用仍面临架构不兼容、计算效率低下以及与语言建模目标不匹配等关键障碍。本研究提出原位测试时训练框架，通过将普遍存在的MLP模块中的最终投影矩阵作为可调快速权重，无需从头进行昂贵重训练即可实现“即插即用”的模型增强。我们进一步将传统重建目标替换为与自回归语言建模核心任务——下一词预测对齐的理论化目标，结合基于文本块的高效更新机制，形成了支持上下文并行的高扩展性算法。大量实验验证了框架有效性：作为原位增强方案，它使40亿参数模型在12.8万长度上下文任务中表现卓越；当从头预训练时，其性能持续超越主流测试时训练方法。消融实验结果进一步揭示了设计决策的内在机理。这些成果共同标志着原位测试时训练向LLM持续学习新范式迈出了重要一步。

English

The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast weights) at inference time, yet its potential in the current LLM ecosystem is hindered by critical barriers including architectural incompatibility, computational inefficiency and misaligned fast weight objectives for language modeling. In this work, we introduce In-Place Test-Time Training (In-Place TTT), a framework that seamlessly endows LLMs with Test-Time Training ability. In-Place TTT treats the final projection matrix of the ubiquitous MLP blocks as its adaptable fast weights, enabling a ``drop-in" enhancement for LLMs without costly retraining from scratch. Furthermore, we replace TTT's generic reconstruction objective with a tailored, theoretically-grounded objective explicitly aligned with the Next-Token-Prediction task governing autoregressive language modeling. This principled objective, combined with an efficient chunk-wise update mechanism, results in a highly scalable algorithm compatible with context parallelism. Extensive experiments validate our framework's effectiveness: as an in-place enhancement, it enables a 4B-parameter model to achieve superior performance on tasks with contexts up to 128k, and when pretrained from scratch, it consistently outperforms competitive TTT-related approaches. Ablation study results further provide deeper insights on our design choices. Collectively, our results establish In-Place TTT as a promising step towards a paradigm of continual learning in LLMs.