생각하는 그릇 패러다임: 자기 맥락을 주도하는 상태 유지 언어 모델

초록

해리 포터의 세계에서는 덤블도어의 마음이 과중한 부담을 느낄 때, 그는 기억을 펜시브로 추출하여 나중에 다시 확인합니다. AI 세계에서는 우리가 펜시브에 해당하는 성숙한 데이터베이스와 검색 시스템을 보유하고 있음에도 불구하고, 우리 모델은 이를 운영할 "지팡이"를 설명할 수 없이 갖고 있지 않습니다. 그들은 마치 주체성 없는 덤블도어처럼, 수동적으로 수공업적으로 구성된 컨텍스트를 자신의 전체 기억으로 받아들일 뿐입니다. 본 연구는 마침내 모델의 손에 그 지팡이를 쥐어줍니다. 우리는 자체 상태를 관리하기 위한 내부 추론 루프를 부여받은 새로운 종류의 파운데이션 모델인 StateLM을 소개합니다. 우리는 모델에 컨텍스트 정리, 문서 색인 생성, 메모 작성과 같은 일련의 메모리 도구를 장착하고, 이 도구들을 능동적으로 관리하도록 훈련합니다. 자신의 컨텍스트를 동적으로 구성하는 법을 학습함으로써, 우리 모델은 고정된 창(컨텍스트 윈도우)이라는 구조적 감옥에서 벗어납니다. 다양한 모델 크기에서의 실험은 StateLM이 다양한 시나리오에서 효과적임을 입증합니다. 장문 문서 질의응답 작업에서는 모든 모델 규모에서 StateLM이 표준 LLM을 꾸준히 능가하며, 채팅 메모리 작업에서는 표준 LLM 대비 10%~20%의 절대 정확도 향상을 달성합니다. 심층 연구 작업인 BrowseComp-Plus에서는 성능 격차가 더욱 두드러집니다: StateLM은 최대 52%의 정확도를 달성한 반면, 표준 LLM 대조군은 약 5% 수준에 머뭅니다. 궁극적으로, 우리의 접근 방식은 LLM을 수동적인 예측기에서 상태를 인지하는 에이전트로 전환시키며, 이때 추론은 상태를 가지며 관리 가능한 프로세스가 됩니다.

English

In the world of Harry Potter, when Dumbledore's mind is overburdened, he extracts memories into a Pensieve to be revisited later. In the world of AI, while we possess the Pensieve-mature databases and retrieval systems, our models inexplicably lack the "wand" to operate it. They remain like a Dumbledore without agency, passively accepting a manually engineered context as their entire memory. This work finally places the wand in the model's hand. We introduce StateLM, a new class of foundation models endowed with an internal reasoning loop to manage their own state. We equip our model with a suite of memory tools, such as context pruning, document indexing, and note-taking, and train it to actively manage these tools. By learning to dynamically engineering its own context, our model breaks free from the architectural prison of a fixed window. Experiments across various model sizes demonstrate StateLM's effectiveness across diverse scenarios. On long-document QA tasks, StateLMs consistently outperform standard LLMs across all model scales; on the chat memory task, they achieve absolute accuracy improvements of 10% to 20% over standard LLMs. On the deep research task BrowseComp-Plus, the performance gap becomes even more pronounced: StateLM achieves up to 52% accuracy, whereas standard LLM counterparts struggle around 5%. Ultimately, our approach shifts LLMs from passive predictors to state-aware agents where reasoning becomes a stateful and manageable process.

생각하는 그릇 패러다임: 자기 맥락을 주도하는 상태 유지 언어 모델

The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

초록

Support