ChatPaper.aiChatPaper

ATANT:人工智能连续性评估框架

ATANT: An Evaluation Framework for AI Continuity

April 8, 2026
作者: Samuel Sameer Tanguturi
cs.AI

摘要

我们提出ATANT(叙事真实性自动化测试框架),这是一个用于评估AI系统连续性的开放框架,旨在衡量系统跨时间持久保持、更新、消歧和重构有意义语境的能力。尽管AI行业已开发出多种记忆组件(RAG流水线、向量数据库、长上下文窗口、画像层),但尚未有公开框架能正式定义或衡量这些组件是否产生真正的连续性。我们将连续性定义为包含7项必备特性的系统属性,引入无需LLM参与评估循环的10项检测点方法,并构建了包含250个故事的叙事测试集,涵盖6大生活领域的1,835个验证问题。通过5轮测试套件迭代评估,参考实现的成绩从孤立模式下的58%(传统架构)提升至100%(250个故事独立测试),在50故事累积模式下达到100%,在250故事累积规模下保持96%。累积测试结果是核心衡量指标:当250个独立人生叙事共存于同一数据库时,系统必须准确检索对应语境的事实且避免交叉污染。ATANT具有系统无关性、模型独立性,可作为构建和验证连续性系统的序列化方法。框架规范、示例故事及评估协议详见https://github.com/Kenotic-Labs/ATANT。完整的250故事集将逐步开源发布。
English
We present ATANT (Automated Test for Acceptance of Narrative Truth), an open evaluation framework for measuring continuity in AI systems: the ability to persist, update, disambiguate, and reconstruct meaningful context across time. While the AI industry has produced memory components (RAG pipelines, vector databases, long context windows, profile layers), no published framework formally defines or measures whether these components produce genuine continuity. We define continuity as a system property with 7 required properties, introduce a 10-checkpoint evaluation methodology that operates without an LLM in the evaluation loop, and present a narrative test corpus of 250 stories comprising 1,835 verification questions across 6 life domains. We evaluate a reference implementation across 5 test suite iterations, progressing from 58% (legacy architecture) to 100% in isolated mode (250 stories) and 100% in 50-story cumulative mode, with 96% at 250-story cumulative scale. The cumulative result is the primary measure: when 250 distinct life narratives coexist in the same database, the system must retrieve the correct fact for the correct context without cross-contamination. ATANT is system-agnostic, model-independent, and designed as a sequenced methodology for building and validating continuity systems. The framework specification, example stories, and evaluation protocol are available at https://github.com/Kenotic-Labs/ATANT. The full 250-story corpus will be released incrementally.
PDF01April 15, 2026