大型语言模型通过系统二策略实现大规模计数的机制可解释性

摘要

大型语言模型（LLMs）虽然在复杂数学问题上表现优异，但在计数任务中仍存在系统性局限。这一问题的根源在于Transformer架构的特性——计数操作需跨层执行，而模型深度限制导致较大规模计数任务的精度下降。为解决此问题，我们受人类系统2认知过程启发，提出一种简单的测试时策略：将大型计数任务分解为模型可可靠解决的独立子问题。通过观测性与因果中介分析，我们评估了该策略并探究其内在机制。机理分析表明：潜在计数结果被计算并存储于每个部分的最终项表征中，通过专用注意力头传递至中间步骤，最终在聚合阶段生成总数。实验结果显示，该策略能帮助LLMs突破架构限制，在大规模计数任务中实现高精度。本研究不仅揭示了LLMs中系统2计数行为的机理，更为理解和改进其推理能力提供了可推广的方法论。

English

Large language models (LLMs), despite strong performance on complex mathematical problems, exhibit systematic limitations in counting tasks. This issue arises from architectural limits of transformers, where counting is performed across layers, leading to degraded precision for larger counting problems due to depth constraints. To address this limitation, we propose a simple test-time strategy inspired by System-2 cognitive processes that decomposes large counting tasks into smaller, independent sub-problems that the model can reliably solve. We evaluate this approach using observational and causal mediation analyses to understand the underlying mechanism of this System-2-like strategy. Our mechanistic analysis identifies key components: latent counts are computed and stored in the final item representations of each part, transferred to intermediate steps via dedicated attention heads, and aggregated in the final stage to produce the total count. Experimental results demonstrate that this strategy enables LLMs to surpass architectural limitations and achieve high accuracy on large-scale counting tasks. This work provides mechanistic insight into System-2 counting in LLMs and presents a generalizable approach for improving and understanding their reasoning behavior.

大型语言模型通过系统二策略实现大规模计数的机制可解释性

Mechanistic Interpretability of Large-Scale Counting in LLMs through a System-2 Strategy

摘要

Support