MemSifter: 결과 기반 프록시 추론을 통한 LLM 메모리 검색 오프로딩

초록

대규모 언어 모델(LLM)이 장기간 작업에 점차 활용되면서 효과적인 장기 메모리 유지가 중요한 과제로 대두되고 있습니다. 기존 방법은 일반적으로 비용과 정확도 간의 상충 관계에 직면해 있습니다. 단순한 저장 방식은 관련 정보 검색에 실패하는 경우가 많으며, 복잡한 인덱싱 방법(메모리 그래프 등)은 높은 계산량을 요구하고 정보 손실을 초래할 수 있습니다. 더욱이 작업 주체 LLM이 모든 메모리를 처리하도록 의존하는 것은 계산 비용이 많이 들고 속도가 느립니다. 이러한 한계를 해결하기 위해 본 연구에서는 메모리 검색 과정을 소규모 프록시 모델에 위임하는 새로운 프레임워크인 MemSifter를 제안합니다. MemSifter는 주 작업 LLM의 부담을 증가시키는 대신, 더 작은 모델을 사용하여 필요한 정보를 검색하기 전에 작업에 대해 추론합니다. 이 접근 방식은 인덱싱 단계에서 높은 계산량을 요구하지 않으며 추론 시 최소한의 오버헤드만 추가합니다. 프록시 모델을 최적화하기 위해 메모리 특화 강화학습(RL) 훈련 패러다임을 도입했습니다. 작업 주체 LLM의 실제 작업 수행 성능을 기반으로 한 작업 결과 지향 보상을 설계하였으며, 이 보상은 작업 주체 LLM과의 다중 상호작용을 통해 검색된 메모리의 실제 기여도를 측정하고 단계적으로 감소하는 기여도에 따라 검색 순위를 구분합니다. 또한 Curriculum Learning 및 Model Merging과 같은 훈련 기법을 활용하여 성능을 향상시켰습니다. Deep Research 작업을 포함한 8개의 LLM 메모리 벤치마크에서 MemSifter를 평가한 결과, 본 방법이 검색 정확도와 최종 작업 완성도 모두에서 기존 최첨단 접근법의 성능을 충족하거나 능가하는 것으로 나타났습니다. MemSifter는 장기 LLM 메모리를 위한 효율적이고 확장 가능한 솔루션을 제공합니다. 향후 연구를 지원하기 위해 모델 가중치, 코드 및 훈련 데이터를 오픈소스로 공개하였습니다.

English

As Large Language Models (LLMs) are increasingly used for long-duration tasks, maintaining effective long-term memory has become a critical challenge. Current methods often face a trade-off between cost and accuracy. Simple storage methods often fail to retrieve relevant information, while complex indexing methods (such as memory graphs) require heavy computation and can cause information loss. Furthermore, relying on the working LLM to process all memories is computationally expensive and slow. To address these limitations, we propose MemSifter, a novel framework that offloads the memory retrieval process to a small-scale proxy model. Instead of increasing the burden on the primary working LLM, MemSifter uses a smaller model to reason about the task before retrieving the necessary information. This approach requires no heavy computation during the indexing phase and adds minimal overhead during inference. To optimize the proxy model, we introduce a memory-specific Reinforcement Learning (RL) training paradigm. We design a task-outcome-oriented reward based on the working LLM's actual performance in completing the task. The reward measures the actual contribution of retrieved memories by mutiple interactions with the working LLM, and discriminates retrieved rankings by stepped decreasing contributions. Additionally, we employ training techniques such as Curriculum Learning and Model Merging to improve performance. We evaluated MemSifter on eight LLM memory benchmarks, including Deep Research tasks. The results demonstrate that our method meets or exceeds the performance of existing state-of-the-art approaches in both retrieval accuracy and final task completion. MemSifter offers an efficient and scalable solution for long-term LLM memory. We have open-sourced the model weights, code, and training data to support further research.

MemSifter: 결과 기반 프록시 추론을 통한 LLM 메모리 검색 오프로딩

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

초록

Support