회상을 넘어서: AI 개인화를 위한 해석 계층으로서의 행동 명세

초록

만약 AI 에이전트가 개인을 대신하여 결정을 내린다면, 그 결정은 사용자와 일치해야 한다. 본 연구에서는 시스템이 개인의 해석을 얼마나 충실히 포착하는지 측정하기 위해 대표 정확도(representational accuracy) 개념을 도입한다. 해석 층위는 행동 명세(Behavioral Specification)로 조작적으로 정의된다. 참조 구현은 개인의 데이터를 해석 패턴으로 공격적으로 압축하여 언어 모델에 컨텍스트로 제공한다. 우리는 교정된 5인의 판정자 LLM 패널이 평가한 보류된 행동 예측의 프로토타입 벤치마크에서 이 명세를 평가한다. 명세를 단독으로, 그리고 전체 원시 말뭉치, 전체 추출된 사실, 네 가지 상용 메모리 시스템(Mem0, Letta, Supermemory, Zep) 등 다양한 컨텍스트 조건과의 조합으로 테스트한다. 14개의 공개된 자서전 말뭉치를 대상으로 한 실험에서, 행동 명세는 전반적으로 대표 정확도를 향상시키고 모델의 얼버무림(hedging)을 거의 제거한다. 약 25배 적은 컨텍스트 비용으로 원시 말뭉치가 제공하는 성능의 대부분을 회복한다. 명세는 사전 학습 기준선과 관계없이 피험자들을 공통된 예측 수준으로 끌어올리며, 절대적 향상 폭은 기준선이 가장 낮은 곳에서 가장 크게 나타난다. 이는 이 접근법의 적용 대상이 사전 학습에서 충분히 대표되지 않은 모든 사람임을 시사한다. 향상은 해석이 필요한 질문에서 가장 두드러지는데, 해석 층위를 제공함으로써 추출된 사실이나 원시 말뭉치만으로는 불가능한 모델 행동이 가능해지기 때문이다. 반면, 회상이 필요한 질문에서는 이 층위가 도움이 되기보다 오히려 방해가 될 수 있다. 결론적으로, 대표 정확도는 회상과 구별되는 개념이며, 인간-AI 정렬은 사용자가 얼마나 정확하게 대표되는지에 의존한다. 대표 정확도는 그러한 정렬을 검증 가능하게 만든다.

English

If an AI agent makes decisions on a person's behalf, those decisions must align with its user. We introduce representational accuracy to measure how faithfully a system captures a person's interpretation. An interpretive layer is operationalized as a Behavioral Specification. Our reference implementation aggressively compresses a person's data into interpretive patterns, served as context to a language model. We evaluate the Specification on a prototype benchmark of held-out behavioral predictions scored by a calibrated 5-judge LLM panel. We test it independently and in composition with a range of context conditions: full raw corpus, full extracted facts, and four commercial memory systems (Mem0, Letta, Supermemory, Zep). Across 14 public-domain autobiographical corpora, the Specification lifts representational accuracy in aggregate and nearly eliminates model hedging. It recovers most of what the raw corpus delivers, at ~25x less context cost. The Specification lifts subjects toward a common predictive level regardless of pretraining baseline; the lift in absolute points is therefore largest where the baseline is lowest, suggesting the population of relevance is anyone not adequately represented in pretraining. Lift is greatest on interpretation-required questions, where providing an interpretive layer enables model behavior that extracted facts or raw corpus do not. Conversely, on recall-required questions, this layer can interfere rather than help. We conclude that representational accuracy is distinct from recall and that human-AI alignment is dependent on how accurately the user is represented. Representational accuracy makes that alignment testable.