通用推理器：面向冻结大型语言模型的单一、可组合即插即用推理模块

摘要

大型语言模型（LLMs）已展现出卓越的通用能力，然而提升诸如推理等技能通常需要大量计算资源，并可能削弱其泛化能力。尽管参数高效微调（PEFT）方法提供了一种更为资源节约的替代方案，但由于架构依赖性，它们通常需要对每个LLM主干进行重新训练。为应对这些挑战，本文提出通用推理器（UniR）——一个轻量级、可组合、即插即用的单一推理模块，可与任何冻结的LLM结合，赋予其专门的推理能力。具体而言，UniR将奖励分解为一个独立的推理模块，该模块利用预定义的奖励独立训练，有效地将轨迹级信号转化为令牌级指导。一旦训练完成，UniR可在推理时通过将其输出逻辑与LLM主干的逻辑简单相加，与任何冻结的LLM结合使用。这种加法结构自然支持模块化组合：针对不同任务训练的多个UniR模块可通过逻辑求和联合应用，实现复杂推理的模块化组合。在数学推理和机器翻译任务上的实验结果表明，使用Llama3.2模型时，UniR显著优于现有的基线微调方法。此外，UniR展示了强大的弱到强泛化能力：在较小模型上训练的推理模块能有效指导更大的LLMs。这使得UniR成为在不损害LLM核心能力的前提下，提升其推理能力的成本效益高、适应性强且稳健的解决方案。代码已开源，地址为https://github.com/hangeol/UniR。

English

Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise their generalization. While Parameter-Efficient Fine-Tuning (PEFT) methods offer a more resource-conscious alternative, they typically requires retraining for each LLM backbone due to architectural dependencies. To address these challenges, here we propose Universal Reasoner (UniR) - a single, lightweight, composable, and plug-and-play reasoning module that can be used with any frozen LLM to endow it with specialized reasoning capabilities. Specifically, UniR decomposes the reward into a standalone reasoning module that is trained independently using predefined rewards, effectively translating trajectory-level signals into token-level guidance. Once trained, UniR can be combined with any frozen LLM at inference time by simply adding its output logits to those of the LLM backbone. This additive structure naturally enables modular composition: multiple UniR modules trained for different tasks can be jointly applied by summing their logits, enabling complex reasoning via composition. Experimental results on mathematical reasoning and machine translation tasks show that UniR significantly outperforms existing baseline fine-tuning methods using the Llama3.2 model. Furthermore, UniR demonstrates strong weak-to-strong generalization: reasoning modules trained on smaller models effectively guide much larger LLMs. This makes UniR a cost-efficient, adaptable, and robust solution for enhancing reasoning in LLMs without compromising their core capabilities. Code is open-sourced at https://github.com/hangeol/UniR

通用推理器：面向冻结大型语言模型的单一、可组合即插即用推理模块

Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

摘要

Support