ChatPaper.aiChatPaper

通过改进型循环机制引导预训练语言模型实现更深层次的思考

Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

November 10, 2025
作者: Sean McLeish, Ang Li, John Kirchenbauer, Dayal Singh Kalra, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Jonas Geiping, Tom Goldstein, Micah Goldblum
cs.AI

摘要

近期深度循环语言模型的研究表明,循环结构能够将训练时的计算量与参数量同测试时的计算需求解耦。本研究探索如何将现有预训练的非循环语言模型转化为深度循环模型。我们发现,通过采用渐进式循环课程学习,在训练过程中逐步增加模型的有效深度,既能保持性能又可降低总计算成本。在数学领域的实验中,将预训练模型转换为循环模型相比直接对原始非循环语言模型进行后训练,能在相同计算预算下获得更优的性能表现。
English
Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute. In this work, we study how to convert existing pretrained non-recurrent language models into depth-recurrent models. We find that using a curriculum of recurrences to increase the effective depth of the model over the course of training preserves performance while reducing total computational cost. In our experiments, on mathematics, we observe that converting pretrained models to recurrent ones results in better performance at a given compute budget than simply post-training the original non-recurrent language model.
PDF192February 7, 2026