何时记忆与何时停止：面向长上下文推理的门控循环记忆

摘要

尽管长上下文推理在众多现实应用中至关重要，但对大语言模型而言仍具挑战性，因为随着上下文长度的增加，模型性能会出现下降。近期研究MemAgent尝试通过类似RNN的循环机制逐块处理上下文，并更新文本记忆库以进行最终回答。然而，这种简单的循环记忆更新存在两个关键缺陷：（i）记忆库可能迅速膨胀，因为即使面对缺乏证据的文本块也会无差别更新；（ii）循环机制缺乏退出机制，导致在收集到充分证据后仍进行不必要的计算。为解决这些问题，我们提出GRU-Mem模型，通过引入两个文本控制门来实现更稳定高效的长上下文推理。具体而言，GRU-Mem仅在更新门开启时更新记忆库，且一旦退出门开启，循环过程将立即终止。为赋予模型这种能力，我们在端到端强化学习中引入两种奖励信号r^{update}和r^{exit}，分别对正确的更新和退出行为进行奖励。在多种长上下文推理任务上的实验表明，GRU-Mem不仅效果显著，其推理速度较原始MemAgent最高可提升400%，充分证明了该方法的有效性与高效性。

English

While reasoning over long context is crucial for various real-world applications, it remains challenging for large language models (LLMs) as they suffer from performance degradation as the context length grows. Recent work MemAgent has tried to tackle this by processing context chunk-by-chunk in an RNN-like loop and updating a textual memory for final answering. However, this naive recurrent memory update faces two crucial drawbacks: (i) memory can quickly explode because it can update indiscriminately, even on evidence-free chunks; and (ii) the loop lacks an exit mechanism, leading to unnecessary computation after even sufficient evidence is collected. To address these issues, we propose GRU-Mem, which incorporates two text-controlled gates for more stable and efficient long-context reasoning. Specifically, in GRU-Mem, the memory only updates when the update gate is open and the recurrent loop will exit immediately once the exit gate is open. To endow the model with such capabilities, we introduce two reward signals r^{update} and r^{exit} within end-to-end RL, rewarding the correct updating and exiting behaviors respectively. Experiments on various long-context reasoning tasks demonstrate the effectiveness and efficiency of GRU-Mem, which generally outperforms the vanilla MemAgent with up to 400\% times inference speed acceleration.

何时记忆与何时停止：面向长上下文推理的门控循环记忆

When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

摘要

Support