Efficiënte Continue Leren in Taalmodellen via Thalamisch Gerouteerde Corticale Kolommen

Samenvatting

Continueel leren is een kernvereiste voor geïmplementeerde taalmodel(len), maar standaard trainings- en fine-tuningpijplijnen blijven broos onder niet-stationaire gegevens. Online updates veroorzaken vaak catastrofaal vergeten, terwijl methoden die stabiliteit verbeteren vaak de latentie, het geheugengebruik of de rekenintensiteit vergroten op manieren die niet goed schalen naar lange contexten. Wij introduceren TRC² (Thalamically Routed Cortical Columns), een decoder-only backbone die continueel leren op architectuurniveau aanpakt. TRC² combineert sparse thalamische routering over corticale kolommen met mechanismen voor modulatie, voorspelling, geheugen en feedback, samen met een snelle correctieve route die snelle aanpassing ondersteunt zonder tragere parameters te destabiliseren. Het resulterende blok is sparse en chunk-parallel, wat efficiënte training en inferentie mogelijk maakt terwijl zuivere ablatiestudies van elk subsysteem behouden blijven. Wij concretiseren een reproduceerbare trainings- en evaluatiestack en een continueel-leer-harnas dat proxy-vergeten meet onder stromende domeinverschuivingen. Over benchmarks voor taalmodellering en continueel leren heen verbetert TRC² de stabiliteit-plasticiteit trade-off bij vergelijkbare rekenkosten, waardoor snelle aanpassing tijdens de stroom mogelijk is terwijl eerder verworven gedrag behouden blijft.

English

Continual learning is a core requirement for deployed language models, yet standard training and fine-tuning pipelines remain brittle under non-stationary data. Online updates often induce catastrophic forgetting, while methods that improve stability frequently increase latency, memory footprint, or dense computation in ways that do not scale well to long contexts. We introduce TRC^{2} (Thalamically Routed Cortical Columns), a decoder-only backbone that addresses continual learning at the architectural level. TRC^{2} combines sparse thalamic routing over cortical columns with mechanisms for modulation, prediction, memory, and feedback, together with a fast corrective pathway that supports rapid adaptation without destabilizing slower parameters. The resulting block is sparse and chunk-parallel, enabling efficient training and inference while preserving clean ablations of each subsystem. We instantiate a reproducible training and evaluation stack and a continual-learning harness that measures proxy forgetting under streaming domain shifts. Across language modeling and continual learning benchmarks, TRC^{2} improves the stability-plasticity tradeoff at comparable compute, enabling rapid on-stream adaptation while preserving previously acquired behavior.

Efficiënte Continue Leren in Taalmodellen via Thalamisch Gerouteerde Corticale Kolommen

Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns

Samenvatting

Support