思想库范式:有状态语言模型掌握自身语境
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context
February 12, 2026
作者: Xiaoyuan Liu, Tian Liang, Dongyang Ma, Deyu Zhou, Haitao Mi, Pinjia He, Yan Wang
cs.AI
摘要
在《哈利·波特》的世界里,当邓布利多的思绪不堪重负时,他会将记忆抽取至冥想盆中供日后重温。而在人工智能领域,虽然我们已拥有堪比冥想盆的成熟数据库与检索系统,但我们的模型却如同缺少操纵它的“魔杖”——它们如同失去主动权的邓布利多,只能被动接受人工设计的上下文作为全部记忆。这项研究终于将魔杖交到了模型手中。我们推出StateLM,一种新型基础模型,其内置推理循环可自主管理状态。我们为模型配备了一套记忆工具(如上下文剪枝、文档索引和笔记功能),并训练其主动管理这些工具。通过学习动态构建自身上下文,模型突破了固定窗口架构的桎梏。不同规模模型的实验表明,StateLM在多样化场景中均表现卓越:在长文档问答任务中,StateLM在所有模型尺度上均稳定超越标准大语言模型;在对话记忆任务中,其准确率较标准大语言模型绝对提升10%至20%;在深度研究任务BrowseComp-Plus上,性能差距更为显著——StateLM准确率最高达52%,而标准大语言模型仅徘徊在5%左右。最终,我们的方法使大语言模型从被动预测器转变为状态感知智能体,让推理成为可管理的有状态过程。
English
In the world of Harry Potter, when Dumbledore's mind is overburdened, he extracts memories into a Pensieve to be revisited later. In the world of AI, while we possess the Pensieve-mature databases and retrieval systems, our models inexplicably lack the "wand" to operate it. They remain like a Dumbledore without agency, passively accepting a manually engineered context as their entire memory. This work finally places the wand in the model's hand. We introduce StateLM, a new class of foundation models endowed with an internal reasoning loop to manage their own state. We equip our model with a suite of memory tools, such as context pruning, document indexing, and note-taking, and train it to actively manage these tools. By learning to dynamically engineering its own context, our model breaks free from the architectural prison of a fixed window. Experiments across various model sizes demonstrate StateLM's effectiveness across diverse scenarios. On long-document QA tasks, StateLMs consistently outperform standard LLMs across all model scales; on the chat memory task, they achieve absolute accuracy improvements of 10% to 20% over standard LLMs. On the deep research task BrowseComp-Plus, the performance gap becomes even more pronounced: StateLM achieves up to 52% accuracy, whereas standard LLM counterparts struggle around 5%. Ultimately, our approach shifts LLMs from passive predictors to state-aware agents where reasoning becomes a stateful and manageable process.