ChatPaper.aiChatPaper

A^3-Bench:基于锚点与吸引子激活的记忆驱动科学推理基准测试

A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation

January 14, 2026
作者: Jian Zhang, Yu He, Zhiyuan Wang, Zhangqi Wang, Kai He, Fangzhi Xu, Qika Lin, Jun Liu
cs.AI

摘要

科学推理不仅依赖于逻辑推断,还需要激活先验知识与经验结构。记忆能够高效复用知识并增强推理的一致性与稳定性。然而,现有基准主要评估最终答案或逐步推导的连贯性,忽视了人类推理中基于记忆驱动的内在机制——即通过激活认知锚点与吸引子,并将其整合至多步推理的过程。为填补这一空白,我们提出A³-Bench(https://a³-bench.github.io),这一基准以锚点-吸引子激活理论为基础,通过双尺度记忆驱动机制评估科学推理能力。首先,我们采用SAPM流程(主体、锚点与吸引子、问题及记忆发展)对涵盖多学科的2,198个科学推理问题进行系统标注。其次,我们引入基于锚点与吸引子的双尺度记忆评估框架,并提出AAUI(锚点-吸引子利用指数)指标以量化记忆激活率。最后,通过多种基础模型与范式的实验,我们验证了A³-Bench的有效性,并解析了记忆激活如何影响推理性能,为记忆驱动型科学推理研究提供了新视角。
English
Scientific reasoning relies not only on logical inference but also on activating prior knowledge and experiential structures. Memory can efficiently reuse knowledge and enhance reasoning consistency and stability. However, existing benchmarks mainly evaluate final answers or step-by-step coherence, overlooking the memory-driven mechanisms that underlie human reasoning, which involves activating anchors and attractors, then integrating them into multi-step inference. To address this gap, we propose A^3-Bench~ https://a3-bench.github.io, a benchmark designed to evaluate scientific reasoning through dual-scale memory-driven activation, grounded in Anchor and Attractor Activation. First, we annotate 2,198 science reasoning problems across domains using the SAPM process(subject, anchor & attractor, problem, and memory developing). Second, we introduce a dual-scale memory evaluation framework utilizing anchors and attractors, along with the AAUI(Anchor--Attractor Utilization Index) metric to measure memory activation rates. Finally, through experiments with various base models and paradigms, we validate A^3-Bench and analyze how memory activation impacts reasoning performance, providing insights into memory-driven scientific reasoning.
PDF742January 16, 2026