ChatPaper.aiChatPaper

超越事实检索:基于生成式语义工作区的RAG情景记忆

Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces

November 10, 2025
作者: Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury
cs.AI

摘要

大语言模型在长上下文推理方面存在根本性挑战:许多文档长度超出其有限上下文窗口,而即使对适配文本的处理性能也会随序列长度增加而下降,这要求必须通过外部记忆框架进行增强。当前解决方案已从基于语义嵌入的检索,演进为采用更复杂的结构化知识图谱表征以提升意义建构和关联性,但这些方法主要适用于基于事实的检索,无法构建时空锚定的叙事表征以追踪贯穿事件实体的演变。为弥补这一缺陷,我们提出生成式语义工作空间——一种受神经科学启发的生成式记忆框架,能构建演化情境的结构化可解释表征,使大语言模型能够对动态角色、行为及时空语境进行推理。该框架包含将输入观察映射为中间语义结构的操作器,以及将这些结构整合至持久化工作空间并确保时空与逻辑一致性的协调器。在包含10万至100万标记量级语料库的Episodic Memory Benchmark测试中,GSW相较现有基于检索增强生成的基线模型性能提升最高达20%。此外,GSW具有高效性,相比次优的标记效率基线可减少51%的查询时上下文标记,显著降低推理时间成本。更广泛而言,GSW为赋予大语言模型类人情景记忆提供了具体蓝图,为构建具备长程推理能力的智能体开辟了新路径。
English
Large Language Models (LLMs) face fundamental challenges in long-context reasoning: many documents exceed their finite context windows, while performance on texts that do fit degrades with sequence length, necessitating their augmentation with external memory frameworks. Current solutions, which have evolved from retrieval using semantic embeddings to more sophisticated structured knowledge graphs representations for improved sense-making and associativity, are tailored for fact-based retrieval and fail to build the space-time-anchored narrative representations required for tracking entities through episodic events. To bridge this gap, we propose the Generative Semantic Workspace (GSW), a neuro-inspired generative memory framework that builds structured, interpretable representations of evolving situations, enabling LLMs to reason over evolving roles, actions, and spatiotemporal contexts. Our framework comprises an Operator, which maps incoming observations to intermediate semantic structures, and a Reconciler, which integrates these into a persistent workspace that enforces temporal, spatial, and logical coherence. On the Episodic Memory Benchmark (EpBench) huet_episodic_2025 comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to 20\%. Furthermore, GSW is highly efficient, reducing query-time context tokens by 51\% compared to the next most token-efficient baseline, reducing inference time costs considerably. More broadly, GSW offers a concrete blueprint for endowing LLMs with human-like episodic memory, paving the way for more capable agents that can reason over long horizons.
PDF82December 2, 2025