나를 기준으로: 장기적 개인 맞춤형 참조 기억 질의응답

초록

개인화 AI 어시스턴트는 이미지, 동영상, 이메일 등 다양한 양식과 출처에 걸친 장기 사용자 메모리를 기억하고 추론해야 합니다. 그러나 기존 장기 메모리 벤치마크는 주로 대화 기록에 초점을 맞추어 실제 경험에 기반한 현실적인 개인화 참조를 포착하지 못하고 있습니다. 본 연구에서는 다중 양식 및 다중 출처 개인화 참조 메모리 질의응답을 위한 최초의 벤치마크인 ATM-Bench를 소개합니다. ATM-Bench는 약 4년간의 개인 메모리 데이터와 인간이 주석을 단 질문-답변 쌍을 포함하며, 여기에는 개인적 참조 해결, 다중 출처 증거 추론, 상충되는 증거 처리 등이 필요한 질의와 이를 뒷받침하는 근거 메모리가 포함됩니다. 또한 서로 다른 출처의 메모리 항목을 구조적으로 표현하기 위해 스키마 기반 메모리(Schema-Guided Memory, SGM)를 제안합니다. 실험에서는 5개의 최신 메모리 시스템과 표준 RAG 베이스라인을 구현하고, 다양한 메모리 수집, 검색 및 답변 생성 기술을 적용한 변형 모델을 평가합니다. 그 결과 ATM-Bench-Hard 세트에서 낮은 성능(20% 미만 정확도)을 확인했으며, SGM이 기존 연구에서 일반적으로 사용된 기술보다 성능을 향상시킴을 발견했습니다. 코드는 https://github.com/JingbiaoMei/ATM-Bench에서 확인할 수 있습니다.

English

Personalized AI assistants must recall and reason over long-term user memory, which naturally spans multiple modalities and sources such as images, videos, and emails. However, existing Long-term Memory benchmarks focus primarily on dialogue history, failing to capture realistic personalized references grounded in lived experience. We introduce ATM-Bench, the first benchmark for multimodal, multi-source personalized referential Memory QA. ATM-Bench contains approximately four years of privacy-preserving personal memory data and human-annotated question-answer pairs with ground-truth memory evidence, including queries that require resolving personal references, multi-evidence reasoning from multi-source and handling conflicting evidence. We propose Schema-Guided Memory (SGM) to structurally represent memory items originated from different sources. In experiments, we implement 5 state-of-the-art memory systems along with a standard RAG baseline and evaluate variants with different memory ingestion, retrieval, and answer generation techniques. We find poor performance (under 20\% accuracy) on the ATM-Bench-Hard set, and that SGM improves performance over Descriptive Memory commonly adopted in prior works. Code available at: https://github.com/JingbiaoMei/ATM-Bench

나를 기준으로: 장기적 개인 맞춤형 참조 기억 질의응답

According to Me: Long-Term Personalized Referential Memory QA

초록

Support