基于丘脑路由皮层柱的语言模型高效持续学习

摘要

持续学习是已部署语言模型的核心需求，然而标准训练与微调流程在非稳态数据下仍显脆弱。在线更新常引发灾难性遗忘，而提升稳定性的方法往往以增加延迟、内存占用或密集计算为代价，难以适应长上下文场景。我们提出TRC²（丘脑路由皮层柱）——一种专为解决持续学习架构难题的解码器主干网络。该架构通过稀疏丘脑路由机制整合皮层柱的调制、预测、记忆与反馈功能，并配备支持快速适应的校正通路，在保持慢速参数稳定性的同时实现敏捷调整。该模块具备稀疏性与分块并行特性，在保证各子系统可独立验证的同时实现高效训练与推理。我们构建了可复现的训练评估框架及持续学习测试环境，用于量化流式领域迁移下的代理遗忘指标。在语言建模与持续学习基准测试中，TRC²在同等算力下优化了稳定性与可塑性的平衡，既能实现流式数据的快速适应，又能有效保留已习得的行为模式。

English

Continual learning is a core requirement for deployed language models, yet standard training and fine-tuning pipelines remain brittle under non-stationary data. Online updates often induce catastrophic forgetting, while methods that improve stability frequently increase latency, memory footprint, or dense computation in ways that do not scale well to long contexts. We introduce TRC^{2} (Thalamically Routed Cortical Columns), a decoder-only backbone that addresses continual learning at the architectural level. TRC^{2} combines sparse thalamic routing over cortical columns with mechanisms for modulation, prediction, memory, and feedback, together with a fast corrective pathway that supports rapid adaptation without destabilizing slower parameters. The resulting block is sparse and chunk-parallel, enabling efficient training and inference while preserving clean ablations of each subsystem. We instantiate a reproducible training and evaluation stack and a continual-learning harness that measures proxy forgetting under streaming domain shifts. Across language modeling and continual learning benchmarks, TRC^{2} improves the stability-plasticity tradeoff at comparable compute, enabling rapid on-stream adaptation while preserving previously acquired behavior.