ChatPaper.aiChatPaper

MoM:面向检索增强生成系统的场景感知文档记忆混合模型

MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems

October 16, 2025
作者: Jihao Zhao, Zhiyuan Ji, Simin Niu, Hanyu Wang, Feiyu Xiong, Zhiyu Li
cs.AI

摘要

传统的RAG范式通常通过理解相关文本片段来响应接收到的查询,这本质上限制了知识内化的深度和推理能力。为解决这一局限,我们的研究将RAG中的文本处理从被动分块转变为主动理解,将这一过程定义为文档记忆提取,旨在模拟人类阅读时的认知过程。在此基础上,我们提出了场景感知文档记忆混合(MoM)框架,旨在高效处理多领域文档,并训练小型语言模型(SLMs)获得主动探索和构建文档记忆的能力。MoM首先指导大型语言模型(LLMs)模拟领域专家生成文档逻辑大纲,从而引导结构化分块和核心内容提取。它采用多路径采样和多视角评估机制,特别设计了代表分块清晰度和提取完整性的综合指标,以选择最优文档记忆。此外,为了在SLMs训练中注入更深层次的人类阅读能力,我们引入了逆向推理策略,从高质量结果中推导出精炼的专家思维路径。最后,利用MoM生成的各种形式内容,我们开发了一个基于概率建模理论证明的三层文档记忆检索机制。在三个不同领域的广泛实验结果表明,MoM框架不仅解决了现有RAG系统中的文本分块难题,为LLMs提供了语义完整的文档记忆,还为SLMs实现以人为中心的智能文本处理铺平了道路。
English
The traditional RAG paradigm, which typically engages in the comprehension of relevant text chunks in response to received queries, inherently restricts both the depth of knowledge internalization and reasoning capabilities. To address this limitation, our research transforms the text processing in RAG from passive chunking to proactive understanding, defining this process as document memory extraction with the objective of simulating human cognitive processes during reading. Building upon this, we propose the Mixtures of scenario-aware document Memories (MoM) framework, engineered to efficiently handle documents from multiple domains and train small language models (SLMs) to acquire the ability to proactively explore and construct document memories. The MoM initially instructs large language models (LLMs) to simulate domain experts in generating document logical outlines, thereby directing structured chunking and core content extraction. It employs a multi-path sampling and multi-perspective evaluation mechanism, specifically designing comprehensive metrics that represent chunk clarity and extraction completeness to select the optimal document memories. Additionally, to infuse deeper human-like reading abilities during the training of SLMs, we incorporate a reverse reasoning strategy, which deduces refined expert thinking paths from high-quality outcomes. Finally, leveraging diverse forms of content generated by MoM, we develop a three-layer document memory retrieval mechanism, which is grounded in our theoretical proof from the perspective of probabilistic modeling. Extensive experimental results across three distinct domains demonstrate that the MoM framework not only resolves text chunking challenges in existing RAG systems, providing LLMs with semantically complete document memories, but also paves the way for SLMs to achieve human-centric intelligent text processing.
PDF22October 17, 2025