Panini:基于结构化记忆的令牌空间持续学习
Panini: Continual Learning in Token Space via Structured Memory
February 16, 2026
作者: Shreyas Rajesh, Pavan Holur, Mehmet Yigit Turali, Chenda Duan, Vwani Roychowdhury
cs.AI
摘要
语言模型正日益用于推理其未经训练的内容,如新文档、演进知识和用户特定数据。检索增强生成(RAG)是常用方法,其将原始文档外部存储为文本块,在推理时仅检索相关子集供大语言模型进行推理。但这种方式会导致测试阶段计算效率低下(模型需重复处理相同文档),且分块检索可能引入无关上下文,增加无依据生成的风险。我们提出一种类人类的非参数持续学习框架:基础模型保持固定,通过将每次新经验整合至外部语义记忆状态来实现持续积累与自我强化。
我们提出的Panini系统通过生成式语义工作区(GSW)实现该框架——这是一种以实体和事件为核心的问答对网络,足以让大语言模型重构经历的情境,并通过基于推理的推断链挖掘潜在知识。面对查询时,Panini仅遍历持续更新的GSW(而非原始文档或文本块),并检索最可能的推断链。在六个问答基准测试中,Panini实现了最高平均性能,较其他竞争基线提升5%-7%,同时使用的答案上下文标记数减少2-30倍,支持全开源流程,并在精心设计的不可回答问题集上降低了无依据回答的比例。
结果表明,通过GSW框架在写入阶段对经验进行高效精准的结构化处理,能在读取阶段同时实现效率与可靠性的提升。代码已开源:https://github.com/roychowdhuryresearch/gsw-memory。
English
Language models are increasingly used to reason over content they were not trained on, such as new documents, evolving knowledge, and user-specific data. A common approach is retrieval-augmented generation (RAG), which stores verbatim documents externally (as chunks) and retrieves only a relevant subset at inference time for an LLM to reason over. However, this results in inefficient usage of test-time compute (LLM repeatedly reasons over the same documents); moreover, chunk retrieval can inject irrelevant context that increases unsupported generation. We propose a human-like non-parametric continual learning framework, where the base model remains fixed, and learning occurs by integrating each new experience into an external semantic memory state that accumulates and consolidates itself continually. We present Panini, which realizes this by representing documents as Generative Semantic Workspaces (GSW) -- an entity- and event-aware network of question-answer (QA) pairs, sufficient for an LLM to reconstruct the experienced situations and mine latent knowledge via reasoning-grounded inference chains on the network. Given a query, Panini only traverses the continually-updated GSW (not the verbatim documents or chunks), and retrieves the most likely inference chains. Across six QA benchmarks, Panini achieves the highest average performance, 5%-7% higher than other competitive baselines, while using 2-30x fewer answer-context tokens, supports fully open-source pipelines, and reduces unsupported answers on curated unanswerable queries. The results show that efficient and accurate structuring of experiences at write time -- as achieved by the GSW framework -- yields both efficiency and reliability gains at read time. Code is available at https://github.com/roychowdhuryresearch/gsw-memory.