ChatPaper.aiChatPaper

A^3-Bench:基於錨點與吸引子激活的記憶驅動科學推理基準測試

A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation

January 14, 2026
作者: Jian Zhang, Yu He, Zhiyuan Wang, Zhangqi Wang, Kai He, Fangzhi Xu, Qika Lin, Jun Liu
cs.AI

摘要

科學推理不僅依賴邏輯推斷,更需要激活先備知識與經驗結構。記憶能有效複用知識並提升推理的一致性和穩定性。然而現有基準主要評估最終答案或逐步推導的連貫性,忽視了人類推理中基於記憶驅動的機制——這種機制會先激活錨點與吸引子,再將其整合至多步推導中。為填補此空白,我們提出A³-Bench(https://a³-bench.github.io),該基準以錨點與吸引子激活理論為基礎,通過雙尺度記憶驅動激活來評估科學推理能力。首先,我們採用SAPM流程(主體、錨點與吸引子、問題、記憶發展)對涵蓋多領域的2,198個科學推理問題進行標註。其次,引入基於錨點與吸引子的雙尺度記憶評估框架,並設計AAUI指標(錨點—吸引子利用指數)量化記憶激活率。最後,通過對多種基礎模型與範式的實驗,我們驗證了A³-Bench的有效性,並解析記憶激活如何影響推理性能,為記憶驅動的科學推理提供新見解。
English
Scientific reasoning relies not only on logical inference but also on activating prior knowledge and experiential structures. Memory can efficiently reuse knowledge and enhance reasoning consistency and stability. However, existing benchmarks mainly evaluate final answers or step-by-step coherence, overlooking the memory-driven mechanisms that underlie human reasoning, which involves activating anchors and attractors, then integrating them into multi-step inference. To address this gap, we propose A^3-Bench~ https://a3-bench.github.io, a benchmark designed to evaluate scientific reasoning through dual-scale memory-driven activation, grounded in Anchor and Attractor Activation. First, we annotate 2,198 science reasoning problems across domains using the SAPM process(subject, anchor & attractor, problem, and memory developing). Second, we introduce a dual-scale memory evaluation framework utilizing anchors and attractors, along with the AAUI(Anchor--Attractor Utilization Index) metric to measure memory activation rates. Finally, through experiments with various base models and paradigms, we validate A^3-Bench and analyze how memory activation impacts reasoning performance, providing insights into memory-driven scientific reasoning.
PDF742January 16, 2026