思想寶庫範式：具狀態語言模型掌握自身脈絡的藝術（注："Pensieve"是《哈利波特》系列中的魔法器物，能儲存和重現記憶，此處意譯為"思想寶庫"以體現其"狀態記憶"的核心特徵。副標題採用動態譯法，強調語言模型從被動接收轉向主動駕馭上下文的能力躍遷。）

摘要

在《哈利波特》的世界裡，當鄧不利多的思維負擔過重時，他會將記憶抽取至儲思盆中以待後續重溫。而在人工智慧領域，雖然我們擁有如同儲思盆般成熟的資料庫與檢索系統，我們的模型卻像缺少了運作它的「魔杖」。它們如同失去主動權的鄧不利多，被動接受人工設計的上下文作為其全部記憶。本研究終於將這根魔杖交到模型手中——我們推出StateLM，這類新型基礎模型具備內在推理循環機制，能自主管理其狀態。我們為模型配備了包含上下文修剪、文件索引、筆記記錄在內的記憶工具套件，並訓練其主動運用這些工具。通過學習動態構建自身上下文，我們的模型突破了固定視窗的架構桎梏。多種模型規模的實驗驗證了StateLM在多元場景下的有效性：在長文件問答任務中，StateLM在所有模型尺度上均穩定超越標準大型語言模型；在對話記憶任務中，其準確率較標準模型絕對提升10%至20%；而在深度研究任務BrowseComp-Plus上，性能差距更為顯著——StateLM達成最高52%的準確率，而標準模型僅維持在5%左右。最終，我們的方法使大型語言模型從被動預測器轉變為具狀態感知能力的智能體，讓推理成為可管理且具狀態持續性的過程。

English

In the world of Harry Potter, when Dumbledore's mind is overburdened, he extracts memories into a Pensieve to be revisited later. In the world of AI, while we possess the Pensieve-mature databases and retrieval systems, our models inexplicably lack the "wand" to operate it. They remain like a Dumbledore without agency, passively accepting a manually engineered context as their entire memory. This work finally places the wand in the model's hand. We introduce StateLM, a new class of foundation models endowed with an internal reasoning loop to manage their own state. We equip our model with a suite of memory tools, such as context pruning, document indexing, and note-taking, and train it to actively manage these tools. By learning to dynamically engineering its own context, our model breaks free from the architectural prison of a fixed window. Experiments across various model sizes demonstrate StateLM's effectiveness across diverse scenarios. On long-document QA tasks, StateLMs consistently outperform standard LLMs across all model scales; on the chat memory task, they achieve absolute accuracy improvements of 10% to 20% over standard LLMs. On the deep research task BrowseComp-Plus, the performance gap becomes even more pronounced: StateLM achieves up to 52% accuracy, whereas standard LLM counterparts struggle around 5%. Ultimately, our approach shifts LLMs from passive predictors to state-aware agents where reasoning becomes a stateful and manageable process.

The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

摘要

Support