均衡推理器：学习吸引子实现可扩展推理

摘要

通过迭代更新潜状态来扩展测试时计算，已成为一种强大的推理范式。然而，这些迭代模型超越记忆模式实现泛化的内部机制仍不明确。我们假设，可泛化的推理源于学习任务条件化的吸引子：一种潜动力系统，其稳定不动点对应有效解。我们通过均衡推理器（Equilibrium Reasoners, EqR）形式化该过程，使其无需外部验证器或任务特定先验即可实现测试时扩展。EqR沿两个维度扩展内部动力学：深度维度，通过运行更多迭代；广度维度，通过聚合多个初始化的随机轨迹。实验表明，测试时扩展的收益与向解对齐吸引子的更强收敛性紧密相关。这种吸引子视角使神经网络能基于任务难度自适应分配测试时计算。简单情形在1至5次迭代内收敛，而困难情形则受益于大规模测试时扩展。通过展开至等效4万层，可扩展的潜推理将前馈模型仅2.6%的准确率提升至Sudoku-Extreme上超过99%。这些结果表明，学习的吸引子景观为理解迭代潜模型中可扩展推理提供了一种有效的机制视角。

English

Scaling test-time compute by iteratively updating a latent state has emerged as a powerful paradigm for reasoning. Yet the internal mechanisms that enable these iterative models to generalize beyond memorized patterns remain unclear. We hypothesize that generalizable reasoning arises from learning task-conditioned attractors: latent dynamical systems whose stable fixed points correspond to valid solutions. We formalize this process through Equilibrium Reasoners (EqR), which enable test-time scaling without external verifiers or task-specific priors. EqR scales internal dynamics along two axes: depth, by running more iterations, and breadth, by aggregating stochastic trajectories from multiple initializations. Empirically, gains from test-time scaling are tightly coupled with stronger convergence toward solution-aligned attractors. This attractor perspective allows neural networks to adaptively allocate test-time compute based on task difficulty. While simple cases converge within 1 to 5 iteration steps, harder cases benefit from massive test-time scaling. By unrolling up to the equivalent of 40,000 layers, scalable latent reasoning boosts accuracy from 2.6% for feedforward models to over 99% on Sudoku-Extreme. These results suggest that learned attractor landscapes provide a useful mechanistic lens for understanding scalable reasoning in iterative latent models.