超越回忆：行为规范作为AI个性化的解释层

摘要

如果AI代理代表个人做出决策，这些决策必须与其用户的目标一致。我们引入"表征准确性"来衡量系统捕捉个人解释的忠实程度。将解释层操作化为"行为规范"。我们的参考实现将用户数据剧烈压缩为解释性模式，作为语言模型的上下文提供。我们通过一个原型基准测试对该规范进行评估，该基准测试由校准后的5人法官LLM小组对保留行为预测进行评分。我们独立测试该规范，并将其与一系列上下文条件组合测试：完整原始语料库、完整提取的事实以及四种商业记忆系统（Mem0、Letta、Supermemory、Zep）。在14个公共领域自传语料库中，该规范整体上提升了表征准确性，并几乎消除了模型的模棱两可。它以约25倍的上下文开销缩减，恢复了原始语料库所能提供的大部分内容。该规范将受试者提升至共同的预测水平，无论其预训练基线如何；因此，绝对提升幅度在基线最低时最大，这表明相关人群是任何在预训练中未被充分代表的个体。在需要解释的问题上，提升最为显著，因为提供解释层能够使模型展现出提取事实或原始语料库无法诱发的行为。相反，在需要回忆的问题上，该层可能产生干扰而非帮助。我们得出结论：表征准确性不同于回忆，且人机对齐依赖于用户被表征的准确性。表征准确性使这种对齐变得可测试。

English

If an AI agent makes decisions on a person's behalf, those decisions must align with its user. We introduce representational accuracy to measure how faithfully a system captures a person's interpretation. An interpretive layer is operationalized as a Behavioral Specification. Our reference implementation aggressively compresses a person's data into interpretive patterns, served as context to a language model. We evaluate the Specification on a prototype benchmark of held-out behavioral predictions scored by a calibrated 5-judge LLM panel. We test it independently and in composition with a range of context conditions: full raw corpus, full extracted facts, and four commercial memory systems (Mem0, Letta, Supermemory, Zep). Across 14 public-domain autobiographical corpora, the Specification lifts representational accuracy in aggregate and nearly eliminates model hedging. It recovers most of what the raw corpus delivers, at ~25x less context cost. The Specification lifts subjects toward a common predictive level regardless of pretraining baseline; the lift in absolute points is therefore largest where the baseline is lowest, suggesting the population of relevance is anyone not adequately represented in pretraining. Lift is greatest on interpretation-required questions, where providing an interpretive layer enables model behavior that extracted facts or raw corpus do not. Conversely, on recall-required questions, this layer can interfere rather than help. We conclude that representational accuracy is distinct from recall and that human-AI alignment is dependent on how accurately the user is represented. Representational accuracy makes that alignment testable.