ATANT: AI 연속성 평가 프레임워크

초록

본 논문에서는 AI 시스템의 연속성(continuity), 즉 시간에 걸쳐 의미 있는 맥락을 유지, 갱신, 명확화, 재구성하는 능력을 측정하기 위한 공개 평가 프레임워크인 ATANT(Automated Test for Acceptance of Narrative Truth)를 제시합니다. AI 업계에서는 메모리 구성 요소(RAG 파이프라인, 벡터 데이터베이스, 장문 컨텍스트 윈도우, 프로필 레이어 등)를 개발해왔으나, 이러한 구성 요소가 진정한 연속성을 생성하는지 여부를 공식적으로 정의하거나 측정하는 프레임워크는 공개된 바 없습니다. 우리는 연속성을 7가지 필수 속성을 갖춘 시스템 속성으로 정의하고, 평가 루프에 LLM을 사용하지 않는 10단계 평가 방법론을 소개하며, 6개 생활 영역에 걸쳐 1,835개의 검증 질문으로 구성된 250개의 이야기로 이루어진 서사 테스트 코퍼스를 제시합니다. 기준 구현체를 5번의 테스트 슈이트 반복에 걸쳐 평가한 결과, 고립 모드(250개 이야기)에서 58%(기존 아키텍처)에서 100%로, 50개 이야기 누적 모드에서 100%, 250개 이야기 누적 규모에서 96%의 성능을 달성했습니다. 누적 결과가 주요 측정 지표입니다. 즉, 250개의 서로 다른 생활 서사가 동일한 데이터베이스에 공존할 때, 시스템은 교차 오염 없이 올바른 맥락에 대한 올바른 사실을 검색해야 합니다. ATANT는 시스템 및 모델에 독립적이며, 연속성 시스템을 구축하고 검증하기 위한 순차적 방법론으로 설계되었습니다. 프레임워크 명세, 예시 이야기, 평가 프로토콜은 https://github.com/Kenotic-Labs/ATANT에서 확인할 수 있습니다. 전체 250개 이야기 코퍼스는 점진적으로 공개될 예정입니다.

English

We present ATANT (Automated Test for Acceptance of Narrative Truth), an open evaluation framework for measuring continuity in AI systems: the ability to persist, update, disambiguate, and reconstruct meaningful context across time. While the AI industry has produced memory components (RAG pipelines, vector databases, long context windows, profile layers), no published framework formally defines or measures whether these components produce genuine continuity. We define continuity as a system property with 7 required properties, introduce a 10-checkpoint evaluation methodology that operates without an LLM in the evaluation loop, and present a narrative test corpus of 250 stories comprising 1,835 verification questions across 6 life domains. We evaluate a reference implementation across 5 test suite iterations, progressing from 58% (legacy architecture) to 100% in isolated mode (250 stories) and 100% in 50-story cumulative mode, with 96% at 250-story cumulative scale. The cumulative result is the primary measure: when 250 distinct life narratives coexist in the same database, the system must retrieve the correct fact for the correct context without cross-contamination. ATANT is system-agnostic, model-independent, and designed as a sequenced methodology for building and validating continuity systems. The framework specification, example stories, and evaluation protocol are available at https://github.com/Kenotic-Labs/ATANT. The full 250-story corpus will be released incrementally.

ATANT: AI 연속성 평가 프레임워크

ATANT: An Evaluation Framework for AI Continuity

초록

Support