기억하는 방법 재고: 평생 LLM 에이전트 메모리에서 원자적 사실을 넘어서

초록

장기적이고 신뢰할 수 있는 상호작용을 가능하게 하기 위해, LLM 에이전트는 누적된 대화 이력을 충실히 저장하고, 효율적으로 검색하며, 깊이 있게 추론할 수 있는 메모리 시스템을 필요로 한다. 기존의 대부분의 방법은 추출된 사실 기반 패러다임을 채택한다. 즉, 수작업으로 제작된 정적 프롬프트가 원시 대화를 원자적 사실로 압축한 후, 이를 저장하고 매칭하여 하위 추론 과정에 주입한다. 그러나 이러한 사실 중심 설계는 필연적으로 원래 대화의 세부 정보를 폐기하고, 흩어져 있는 고립된 사실에 대한 깊은 추론을 지원하지 못한다. 더욱이 정적 프롬프트는 다양한 대화 스타일에서 일관된 추출 세분성을 유지할 수 없다. 이러한 한계를 해결하기 위해, 우리는 TriMem을 제안한다. 이는 세 가지 공존하는 표현 세분성을 유지하는데, 여기에는 저장 충실도를 위한 소스 식별자로 고정된 원시 대화 세그먼트, 효율적인 메모리 검색을 위한 추출된 원자적 사실, 그리고 깊은 추론을 위해 분산된 사실을 통합적인 의미 이해로 집계하는 합성된 프로파일이 포함된다. 또한, 응답 품질 피드백을 통해 추출 및 프로파일링 프롬프트를 반복적으로 개선하는 TextGrad 기반 프롬프트 최적화를 도입하여, 파라미터 업데이트 없이 평생 진화를 달성한다. LoCoMo 및 PerLTQA 데이터셋에서 여러 LLM 백본을 대상으로 한 광범위한 실험은 TriMem이 강력한 메모리 기준 모델들을 일관되게 능가함을 보여준다. 코드는 https://TMLR-TriMem.github.io 에서 확인할 수 있다.

English

To enable reliable long-term interaction, LLM agents require a memory system that can faithfully store, efficiently retrieve, and deeply reason over accumulated dialogue history. Most existing methods adopt an extracted fact based paradigm: handcrafted static prompts compress raw dialogues into atomic facts, which are then stored, matched, and injected into downstream reasoning. Nevertheless, such fact-centric designs inevitably discard fine-grained details in original dialogues and fail to support deep reasoning over scattered isolated facts. Moreover, static prompts cannot maintain consistent extraction granularity across diverse dialogue styles. To address these limitations, we propose TriMem, which maintains three coexisting representation granularities, including raw dialogue segments anchored by source identifiers for storage fidelity, extracted atomic facts for efficient memory retrieval, synthesized profiles that aggregate dispersed facts into holistic semantic understanding for deep reasoning. We further adopt TextGrad-based prompt optimization, which iteratively refines extraction and profiling prompts via response quality feedback, achieving lifelong evolution without any parameter updating. Extensive experiments on LoCoMo and PerLTQA across multiple LLM backbones demonstrate that TriMem consistently outperforms strong memory baselines. The code is available at https://TMLR-TriMem.github.io .