ChatPaper.aiChatPaper

**KnowMe-Bench:為終身數位伴侶打造的人物理解基準測試**

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

January 8, 2026
作者: Tingyu Wu, Zhisheng Chen, Ziyan Weng, Shuhe Wang, Chenglong Li, Shuo Zhang, Sen Hu, Silin Wu, Qizhen Lan, Huacan Wang, Ronghao Chen
cs.AI

摘要

現有的長時程記憶基準測試多採用多輪對話或合成用戶歷史,這使得檢索性能難以準確反映對個人的理解。我們提出\BenchName,一個基於長篇自傳體敘事構建的公開可發布基準測試集,其中行動、情境與內心思維為推斷穩定動機與決策原則提供了密集證據。\BenchName~將每段敘事重構為具備回溯意識的時間錨定序列,並透過涵蓋事實回憶、主觀狀態歸因及原則層級推理的證據鏈結問題來評估模型。在各類敘事來源中,檢索增強系統主要提升事實準確性,但基於時間脈絡的解釋與高層次推論仍持續存在錯誤,凸顯了超越檢索的記憶機制之必要性。我們的數據存放於KnowMeBench{https://github.com/QuantaAlpha/KnowMeBench}。
English
Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in KnowMeBench{https://github.com/QuantaAlpha/KnowMeBench}.
PDF461January 15, 2026