破解循环：语言与推理的吸引子模型

摘要

循环变压器通过迭代精炼潜在表示，为纯前馈计算提供了有前景的替代方案，有效改进了语言建模与推理能力。然而，循环架构仍存在训练不稳定、优化与部署成本高昂、以及受限于固定小深度等问题。我们提出吸引子模型，其中主干模块首先生成输出嵌入，吸引子模块通过求解不动点来精炼该嵌入，梯度通过隐式微分获得。因此，训练内存随有效深度保持恒定，迭代步数由收敛性自适应确定。实验表明，吸引子模型在两个场景中均优于现有模型：大规模语言模型预训练及小模型推理。在语言建模中，吸引子模型在不同规模下相较于标准变压器和稳定的循环模型实现了帕累托改进，困惑度最高降低46.6%，下游准确率最高提升19.7%，同时降低了训练成本。值得注意的是，一个7.7亿参数的吸引子模型性能优于在双倍数据上训练的13亿参数变压器。在具有挑战性的推理任务中，我们展示仅含2700万参数、约1000个样本的模型在数独极端难题上达到91.4%准确率，在迷宫困难任务上达到93.1%准确率，其扩展性优于完全失败的Claude和GPT o3等前沿模型，以及在大规模下崩溃的专用递归推理器。最后，我们发现吸引子模型展现出一种新现象——我们称之为平衡内化：不动点训练使模型初始输出嵌入接近平衡态，从而在推理时可移除求解器且性能几乎不受影响。综合这些结果表明，吸引子模型通过将循环转化为模型可学习内化的计算，实现了迭代精炼的可扩展性。

English

Looped Transformers offer a promising alternative to purely feed-forward computation by iteratively refining latent representations, improving language modeling and reasoning. Yet recurrent architectures remain unstable to train, costly to optimize and deploy, and constrained to small, fixed recurrence depths. We introduce Attractor Models, in which a backbone module first proposes output embeddings, then an attractor module refines them by solving for the fixed point, with gradients obtained through implicit differentiation. Thus, training memory remains constant in effective depth, and iterations are chosen adaptively by convergence. Empirically, Attractor Models outperform existing models across two regimes, large-scale language-model pretraining and reasoning with tiny models. In language modeling, Attractor Models deliver a Pareto improvement over standard Transformers and stable looped models across sizes, improving perplexity by up to 46.6% and downstream accuracy by up to 19.7% while reducing training cost. Notably, a 770M Attractor Model outperforms a 1.3B Transformer trained on twice as many tokens. On challenging reasoning tasks, we show that our model with only 27M parameters and approximately 1000 examples achieves 91.4% accuracy on Sudoku-Extreme and 93.1% on Maze-Hard, scaling favorably where frontier models like Claude and GPT o3, fail completely, and specialized recursive reasoners collapse at larger sizes. Lastly, we show that Attractor Models exhibit a novel phenomenon, which we call equilibrium internalization: fixed-point training places the model's initial output embedding near equilibrium, allowing the solver to be removed at inference time with little degradation. Together, these results suggest that Attractor Models make iterative refinement scalable by turning recurrence into a computation the model can learn to internalize.

破解循环：语言与推理的吸引子模型

Solve the Loop: Attractor Models for Language and Reasoning

摘要

Support