**KnowMe-Bench:面向终身数字伴侣的人物理解能力基准测试**
KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions
January 8, 2026
作者: Tingyu Wu, Zhisheng Chen, Ziyan Weng, Shuhe Wang, Chenglong Li, Shuo Zhang, Sen Hu, Silin Wu, Qizhen Lan, Huacan Wang, Ronghao Chen
cs.AI
摘要
现有长时记忆基准大多采用多轮对话或合成用户历史数据,这使得检索性能难以准确衡量对个体的理解程度。我们推出\BenchName,一个基于长篇自传体叙事构建的可公开获取的基准测试。该测试通过行为、情境和内心独白为推断稳定动机与决策原则提供密集证据。\BenchName将每段叙事重构为具有闪回意识的时间锚定流,并通过涵盖事实回忆、主观状态归因和原则层面推理的证据链问题评估模型性能。在多样化叙事源上的实验表明,检索增强系统主要提升事实准确性,但在时间锚定解释和高级推理任务上错误持续存在,凸显了超越检索机制的记忆建模需求。我们的数据详见KnowMeBench{https://github.com/QuantaAlpha/KnowMeBench}。
English
Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in KnowMeBench{https://github.com/QuantaAlpha/KnowMeBench}.