超越事实检索:基于生成式语义工作空间的RAG情景记忆
Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces
November 10, 2025
作者: Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury
cs.AI
摘要
大型语言模型(LLM)在长上下文推理方面面临根本性挑战:许多文档长度超过其有限上下文窗口,而对适配文本的处理性能会随序列长度增加而下降,这要求必须通过外部记忆框架进行增强。当前解决方案已从基于语义嵌入的检索,发展为采用更复杂的结构化知识图谱表征以提升语义理解与关联性,但这些方法仅适用于基于事实的检索,无法构建时空锚定的叙事表征以追踪事件流中的实体。为弥补这一缺陷,我们提出生成式语义工作空间(GSW)——一种受神经科学启发的生成式记忆框架,能构建演化情境的结构化可解释表征,使LLM能够对动态角色、行为及时空语境进行推理。该框架包含操作器(将输入观察映射为中间语义结构)与协调器(将这些结构整合至保持时空逻辑一致性的持久化工作空间)。在包含10万至100万标记语料库的《情景记忆基准测试》(EpBench)中,GSW相较基于检索增强生成(RAG)的基线模型性能提升高达20%。此外,GSW具备高效性,相比次优的标记效率基线模型,其查询时上下文标记数量减少51%,显著降低推理时间成本。更广泛而言,GSW为LLM赋予类人情景记忆提供了具体实现路径,为构建能进行长程推理的更智能体奠定基础。
English
Large Language Models (LLMs) face fundamental challenges in long-context reasoning: many documents exceed their finite context windows, while performance on texts that do fit degrades with sequence length, necessitating their augmentation with external memory frameworks. Current solutions, which have evolved from retrieval using semantic embeddings to more sophisticated structured knowledge graphs representations for improved sense-making and associativity, are tailored for fact-based retrieval and fail to build the space-time-anchored narrative representations required for tracking entities through episodic events. To bridge this gap, we propose the Generative Semantic Workspace (GSW), a neuro-inspired generative memory framework that builds structured, interpretable representations of evolving situations, enabling LLMs to reason over evolving roles, actions, and spatiotemporal contexts. Our framework comprises an Operator, which maps incoming observations to intermediate semantic structures, and a Reconciler, which integrates these into a persistent workspace that enforces temporal, spatial, and logical coherence. On the Episodic Memory Benchmark (EpBench) huet_episodic_2025 comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to 20\%. Furthermore, GSW is highly efficient, reducing query-time context tokens by 51\% compared to the next most token-efficient baseline, reducing inference time costs considerably. More broadly, GSW offers a concrete blueprint for endowing LLMs with human-like episodic memory, paving the way for more capable agents that can reason over long horizons.