语言模型中基于丘脑路由皮质柱的高效持续学习

摘要

持续学习是已部署语言模型的核心需求，然而标准训练与微调流程在非稳态数据下仍显脆弱。在线更新常引发灾难性遗忘，而提升稳定性的方法往往以增加延迟、内存占用或密集计算为代价，难以适应长上下文场景。我们提出TRC²（丘脑路由皮质柱）——一种仅含解码器的架构，从结构层面解决持续学习问题。该模型通过稀疏丘脑路由机制整合皮质柱的调制、预测、记忆与反馈功能，并配备支持快速适应的校正通路，可在不干扰慢速参数稳定性的前提下实现敏捷调整。该模块具备稀疏性与分块并行特性，在保持各子系统可独立消融的同时实现高效训练与推理。我们构建了可复现的训练评估框架及持续学习测试集，用于衡量流式领域偏移下的代理遗忘指标。在语言建模与持续学习基准测试中，TRC²在同等算力下优化了稳定性与可塑性的平衡，既能实现流式数据的快速适应，又能有效保留已习得的行为模式。

English

Continual learning is a core requirement for deployed language models, yet standard training and fine-tuning pipelines remain brittle under non-stationary data. Online updates often induce catastrophic forgetting, while methods that improve stability frequently increase latency, memory footprint, or dense computation in ways that do not scale well to long contexts. We introduce TRC^{2} (Thalamically Routed Cortical Columns), a decoder-only backbone that addresses continual learning at the architectural level. TRC^{2} combines sparse thalamic routing over cortical columns with mechanisms for modulation, prediction, memory, and feedback, together with a fast corrective pathway that supports rapid adaptation without destabilizing slower parameters. The resulting block is sparse and chunk-parallel, enabling efficient training and inference while preserving clean ablations of each subsystem. We instantiate a reproducible training and evaluation stack and a continual-learning harness that measures proxy forgetting under streaming domain shifts. Across language modeling and continual learning benchmarks, TRC^{2} improves the stability-plasticity tradeoff at comparable compute, enabling rapid on-stream adaptation while preserving previously acquired behavior.

语言模型中基于丘脑路由皮质柱的高效持续学习

Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns

摘要

Support