洛卡斯:你的模型是局部支持参数化记忆的原则性初始化器
Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories
February 4, 2026
作者: Sidi Lu, Zhenwen Liang, Dongyang Ma, Yan Wang, Haitao Mi, Dong Yu
cs.AI
摘要
本文旨在将测试时训练与一种新型参数化内存相融合,该内存可灵活地从模型参数中卸载或合并。我们提出Locas——一种局部支持的参数化内存,其共享现代Transformer中前馈网络块的设计,既能灵活持久化融入模型参数,又支持高效的持续学习。我们探讨了Locas的两种主要变体:一种采用传统双层MLP设计,具备更明确的理论保证;另一种与前沿大语言模型共享GLU-FFN结构,可便捷附加于现有模型,实现参数高效与计算高效兼备的持续学习。关键的是,我们通过原理性方法(重用模型参数、激活值和/或梯度)证明,对此类低秩侧向FFN式内存进行恰当初始化,对实现快速收敛、提升泛化能力及防止灾难性遗忘至关重要。我们在PG-19全书语言建模和LoCoMo长上下文对话问答任务上验证了所提内存机制。在最低仅增加0.02%参数量的情况下,Locas-GLU既能存储历史上下文信息,又可维持极小的上下文窗口。此外,通过对比性MMLU评估,我们还测试了模型在使用Locas记忆整本书后的通用能力损失。结果表明,Locas能够将历史上下文持久化为参数化知识,并最大程度减少对模型现有内部知识的灾难性遗忘。
English
In this paper, we aim to bridge test-time-training with a new type of parametric memory that can be flexibly offloaded from or merged into model parameters. We present Locas, a Locally-Supported parametric memory that shares the design of FFN blocks in modern transformers, allowing it to be flexibly permanentized into the model parameters while supporting efficient continual learning. We discuss two major variants of Locas: one with a conventional two-layer MLP design that has a clearer theoretical guarantee; the other one shares the same GLU-FFN structure with SOTA LLMs, and can be easily attached to existing models for both parameter-efficient and computation-efficient continual learning. Crucially, we show that proper initialization of such low-rank sideway-FFN-style memories -- performed in a principled way by reusing model parameters, activations and/or gradients -- is essential for fast convergence, improved generalization, and catastrophic forgetting prevention. We validate the proposed memory mechanism on the PG-19 whole-book language modeling and LoCoMo long-context dialogue question answering tasks. With only 0.02\% additional parameters in the lowest case, Locas-GLU is capable of storing the information from past context while maintaining a much smaller context window. In addition, we also test the model's general capability loss after memorizing the whole book with Locas, through comparative MMLU evaluation. Results show the promising ability of Locas to permanentize past context into parametric knowledge with minimized catastrophic forgetting of the model's existing internal knowledge.