思想库范式：有状态语言模型掌握自身语境

摘要

在《哈利·波特》的世界里，当邓布利多的思绪不堪重负时，他会将记忆抽取至冥想盆中供日后重温。而在人工智能领域，虽然我们已拥有堪比冥想盆的成熟数据库与检索系统，但我们的模型却如同缺少操纵它的“魔杖”——它们如同失去主动权的邓布利多，只能被动接受人工设计的上下文作为全部记忆。这项研究终于将魔杖交到了模型手中。我们推出StateLM，一种新型基础模型，其内置推理循环可自主管理状态。我们为模型配备了一套记忆工具（如上下文剪枝、文档索引和笔记功能），并训练其主动管理这些工具。通过学习动态构建自身上下文，模型突破了固定窗口架构的桎梏。不同规模模型的实验表明，StateLM在多样化场景中均表现卓越：在长文档问答任务中，StateLM在所有模型尺度上均稳定超越标准大语言模型；在对话记忆任务中，其准确率较标准大语言模型绝对提升10%至20%；在深度研究任务BrowseComp-Plus上，性能差距更为显著——StateLM准确率最高达52%，而标准大语言模型仅徘徊在5%左右。最终，我们的方法使大语言模型从被动预测器转变为状态感知智能体，让推理成为可管理的有状态过程。

English

In the world of Harry Potter, when Dumbledore's mind is overburdened, he extracts memories into a Pensieve to be revisited later. In the world of AI, while we possess the Pensieve-mature databases and retrieval systems, our models inexplicably lack the "wand" to operate it. They remain like a Dumbledore without agency, passively accepting a manually engineered context as their entire memory. This work finally places the wand in the model's hand. We introduce StateLM, a new class of foundation models endowed with an internal reasoning loop to manage their own state. We equip our model with a suite of memory tools, such as context pruning, document indexing, and note-taking, and train it to actively manage these tools. By learning to dynamically engineering its own context, our model breaks free from the architectural prison of a fixed window. Experiments across various model sizes demonstrate StateLM's effectiveness across diverse scenarios. On long-document QA tasks, StateLMs consistently outperform standard LLMs across all model scales; on the chat memory task, they achieve absolute accuracy improvements of 10% to 20% over standard LLMs. On the deep research task BrowseComp-Plus, the performance gap becomes even more pronounced: StateLM achieves up to 52% accuracy, whereas standard LLM counterparts struggle around 5%. Ultimately, our approach shifts LLMs from passive predictors to state-aware agents where reasoning becomes a stateful and manageable process.