RLAD：训练大语言模型发现解决推理问题的抽象方法

摘要

推理需要超越模式匹配或解决方案的记忆，识别并实施能够推导出难题答案的“算法流程”。这要求认识到最相关的原语、中间结果或共享流程，并在此基础上进行构建。尽管通过强化学习（RL）对长链思维进行后训练最终旨在揭示此类算法行为，但大型模型学习到的多数推理轨迹未能持续捕捉或重用流程，反而陷入冗长且退化的探索。为提升推理效率，我们引入了推理抽象：即对流程性和事实性知识的简洁自然语言描述，引导模型学习成功的推理。我们训练模型使其能够针对问题提出多种抽象，随后通过强化学习激励在利用这些抽象提供的信息基础上构建解决方案。这形成了一种双玩家RL训练范式，简称RLAD，它联合训练一个抽象生成器和一个解决方案生成器。该设置有效实现了结构化探索，解耦了抽象提议与解决方案生成的学习信号，并提升了对更难题目的泛化能力。我们还证明，在较大的测试预算下，将更多测试时计算资源用于生成抽象比生成更多解决方案更有利于性能提升，这体现了抽象在引导有意义探索中的重要作用。

English

Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement "algorithmic procedures" that can be used to deduce answers to hard problems. Doing so requires realizing the most relevant primitives, intermediate results, or shared procedures, and building upon them. While RL post-training on long chains of thought ultimately aims to uncover this kind of algorithmic behavior, most reasoning traces learned by large models fail to consistently capture or reuse procedures, instead drifting into verbose and degenerate exploration. To address more effective reasoning, we introduce reasoning abstractions: concise natural language descriptions of procedural and factual knowledge that guide the model toward learning successful reasoning. We train models to be capable of proposing multiple abstractions given a problem, followed by RL that incentivizes building a solution while using the information provided by these abstractions. This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and a solution generator. This setup effectively enables structured exploration, decouples learning signals of abstraction proposal and solution generation, and improves generalization to harder problems. We also show that allocating more test-time compute to generating abstractions is more beneficial for performance than generating more solutions at large test budgets, illustrating the role of abstractions in guiding meaningful exploration.

RLAD：训练大语言模型发现解决推理问题的抽象方法

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

摘要

Support