均衡推理器：學習吸引子實現可擴展推理

摘要

通過迭代更新潛在狀態來擴展測試時計算，已成為一種強大的推理範式。然而，使這些迭代模型能夠泛化至超越記憶模式的內部機制仍不清楚。我們假設，可泛化的推理源於學習任務條件化的吸引子：一種潛在動力系統，其穩定不動點對應於有效解。我們透過均衡推理器（EqR）將此過程形式化，使其無需外部驗證器或任務特定先驗即可實現測試時擴展。EqR沿兩個維度擴展內部動態：深度，透過運行更多迭代；廣度，透過聚合來自多個初始化的隨機軌跡。經驗上，測試時擴展的收益與向解對齊吸引子的更強收斂緊密相關。這種吸引子視角使神經網路能夠根據任務難度自適應地分配測試時計算。雖然簡單案例在1至5個迭代步驟內收斂，但較難案例則受益於大規模測試時擴展。透過展開多達等效40,000層的網路，可擴展的潛在推理將準確率從前饋模型的2.6%提升至Sudoku-Extreme上的99%以上。這些結果表明，學習到的吸引子景觀為理解迭代潛在模型中的可擴展推理提供了有用的機制視角。

English

Scaling test-time compute by iteratively updating a latent state has emerged as a powerful paradigm for reasoning. Yet the internal mechanisms that enable these iterative models to generalize beyond memorized patterns remain unclear. We hypothesize that generalizable reasoning arises from learning task-conditioned attractors: latent dynamical systems whose stable fixed points correspond to valid solutions. We formalize this process through Equilibrium Reasoners (EqR), which enable test-time scaling without external verifiers or task-specific priors. EqR scales internal dynamics along two axes: depth, by running more iterations, and breadth, by aggregating stochastic trajectories from multiple initializations. Empirically, gains from test-time scaling are tightly coupled with stronger convergence toward solution-aligned attractors. This attractor perspective allows neural networks to adaptively allocate test-time compute based on task difficulty. While simple cases converge within 1 to 5 iteration steps, harder cases benefit from massive test-time scaling. By unrolling up to the equivalent of 40,000 layers, scalable latent reasoning boosts accuracy from 2.6% for feedforward models to over 99% on Sudoku-Extreme. These results suggest that learned attractor landscapes provide a useful mechanistic lens for understanding scalable reasoning in iterative latent models.